Introduction to R is brought to you by the Centre for the Analysis of Genome Evolution & Function (CAGEF) bioinformatics training initiative. This course was developed based on feedback on the needs and interests of the Department of Cell & Systems Biology and the Department of Ecology and Evolutionary Biology.
The structure of this course is a code-along style; It is 100% hands on! A few hours prior to each lecture, links to the materials will be available for download at QUERCUS. The teaching materials will consist of an R Markdown Notebook with concepts, comments, instructions, and blank coding spaces that you will fill out with R by coding along with the instructor. Other teaching materials include a live-updating HTML version of the notebook, and datasets to import into R - when required. This learning approach will allow you to spend the time coding and not taking notes!
As we go along, there will be some in-class challenge questions for you to solve either individually or in cooperation with your peers. Post lecture assessments will also be available (see syllabus for grading scheme and percentages of the final mark) through DataCamp to help cement and/or extend what you learn each week.
We’ll take a blank slate approach here to R and assume that you pretty much know nothing about programming. From the beginning of this course to the end, we want to take you from some potential scenarios such as…
A pile of data (like an excel file or tab-separated file) full of experimental observations that you don’t know what to do with it.
Maybe you’re manipulating large tables all in excel, making custom formulas and pivot tables with graphs. Now you have to repeat similar experiments and do the analysis again.
You’re generating high-throughput data and there aren’t any bioinformaticians around to help you sort it out.
You heard about R and what it could do for your data analysis but don’t know what that means or where to start.
and get you to a point where you can…
Format your data correctly for analysis.
Produce basic plots and perform exploratory analysis.
Make functions and scripts for re-analysing existing or new data sets.
Track your experiments in a digital notebook like R Markdown!
In the first lesson, we will talk about the basic data structures and objects in R, get cozy with the R Markdown Notebook environment, and learn how to get help when you are stuck because everyone gets stuck - a lot! Then you will learn how to get your data in and out of R, how to tidy our data (data wrangling), and then subset and merge data. After that, we will dig into the data and learn how to make basic plots for both exploratory data analysis and publication. We’ll follow that up with data cleaning and string manipulation; this is really the battleground of coding - getting your data into just the right format where you can analyse it more easily. We’ll then spend a lecture digging into the functions available for the statistical analysis of your data. Lastly, we will learn about control flow and how to write customized functions, which can really save you time and help scale up your analyses.
Don’t forget, the structure of the class is a code-along style: it is fully hands on. At the end of each lecture, the complete notes will be made available in a PDF format through the corresponding Quercus module so you don’t have to spend your attention on taking notes.
There is no single path correct from A to B - although some paths may be more elegant, or more efficient than others. With that in mind, the emphasis in this lecture series will be on:
tidyverse series of packages. This resource is
well-maintained by a large community of developers. While not always the
“fastest” approach, this additional layer can help ensure your code
still runs (somewhat) smoothly later down the road.This is the second in a series of seven lectures. Last lecture we
discussed the basic functions and structures of R as well as how to
navigate them. This week we will focus more on the
data.frame object and learning how to manipulate the
information it holds.
At the end of this session you will be familiar with importing data
from plain text and excel files; filtering, sorting, and re-arranging
your data.frames using the dplyr package; the
concept of piping command calls; and writing your resulting data to
files. Our topics are broken into:
dplyr package to filter, subset and manipulate
your data and to perform simple calculations.Grey background: Command-line code, R library and
function names. Backticks are also use for in-line code.... fill in the code here if you are coding alongBlue box: A key concept that is being introduced
Yellow box: Risk or caution
Green boxes: Recommended reads and resources to learn R
Red boxes: A comprehension question which may or may not involve a coding cell. You usually find these at the end of a section.
Each week, new lesson files will appear within your RStudio folders.
We are pulling from a GitHub repository using this Repository
git-pull link. Simply click on the link and it will take you to the
University of Toronto datatools
Hub. You will need to use your UTORid credentials to complete the
login process. From there you will find each week’s lecture files in the
directory /2025-09-IntroR/Lecture_XX. You will find a
partially coded skeleton.Rmd file as well as all of the
data files necessary to run the week’s lecture.
Alternatively, you can download the R-Markdown Notebook
(.Rmd) and data files from the RStudio server to your
personal computer if you would like to run independently of the Toronto
tools.
A live lecture version will be available at camok.github.io that will update as the lecture progresses. Be sure to refresh to take a look if you get lost!
As mentioned above, at the end of each lecture there will be a completed version of the lecture code released as an HTML file under the Modules section of Quercus.
The following datasets used in this week’s class come from a published manuscript on PLoS Pathogens entitled “High-throughput phenotyping of infection by diverse microsporidia species reveals a wild C. elegans strain with opposing resistance and susceptibility traits” by Mok et al., 2023. These datasets focus on the an analysis of infection in wild isolate strains of the nematode C. elegans by environmental pathogens known as microsporidia. The authors collected embryo counts from individual animals in the population after population-wide infection by microsporidia and we’ll spend our next few classes working with the dataset to learn how to format and manipulate it.
This is a comma-separated version of the metadata data from our measurements. This dataset tracks information for each experimental condition measured including experimental dates, reagent versions, and sample locations. We’ll use this file to ease our way into importing, manipulating, and exporting in today’s class.
This is a series of amalgamated datasets that we will use to show how we can import even entire Excel books into R. This file contains two sheets containing experimental measurements as well as the experimental metadata from Dataset 1.
Packages are groups of related functions that serve a purpose. They can be a series of functions to help analyse specific data or they could be a group of functions used to simplify the process of formatting your data (more on that later in this lecture!).
Depending on their structure they may also rely on other packages.
There are a few different places you can install packages from R. Listed in order of decreasing trustworthiness:
CRAN (The Comprehensive R Archive Network)
Bioconductor (Bioinformatics/Genomics focus)
GitHub
Joe’s website
Regardless of where you download a package from, it’s a good idea to document its installation, especially if you had to troubleshoot the installation (you’ll eventually be there, I promise!)
devtools is a package that is used for developers to
make R packages, but it also helps us to install packages from
GitHub. It is downloaded from CRAN.
Installing packages through your RStudio instance is relatively straightforward but any packages you install only remain during your current instance (login) of the hub. Whenever you logout from the JupyterHub (or datatools.utoronto.ca), these installed libraries will essentially vaporize.
The install.packages() command will work just as it
should in a desktop version of RStudio.
# Always keep installation commands commented out
install.packages('devtools')
## Warning: package 'devtools' is in use and will not be installed
R may give you package installation warnings. Don’t panic. In general, your package will either be installed and R will test if the installed package can be loaded, or R will give you a non-zero exit status - which means your package was not installed. If you read the entire error message, it will give you a hint as to why the package did not install.
Some packages depend on previously developed packages and
can only be installed after another package is installed in your
library. Similarly, that previous package may depend on another package
and so on. To solve this potential issue we use the
dependencies logical parameter in our call.
install.packages('devtools', dependencies = TRUE)
## Warning: package 'devtools' is in use and will not be installed
# remove.packages("devtools") # Uninstall any CRAN package
library() to load your packages after
installationA package only has to be installed once. It is now in your library. To use a package, you must load the package into memory. Unless this is one of the packages R loads automatically, you choose which packages to load every session.
Installing libraries on datatools.utoronto.ca: Unlike on a personal installation of RStudio, we are running through an RStudio server which creates a fresh “instance” of an RStudio installation each time you log in. Some packages are pre-installed by system administrators but any packages outside of these essential ones, will need to be installed every time you restart your RStudio instance. Keep that in mind!
library() Takes a single argument.
library() will throw an error if you try to load a
package that is not already installed. You may see
require() on help pages, which also loads packages. It is
usually used inside functions (it gives a warning instead of an
error if a package is not installed).
Errors versus warnings: So far we’ve seen that errors will stop code from running. Warnings allow code to run until an error is reached. An eventual error may not be the result of a warning but it certainly leaves your code vulnerable to errors down the road.
# When we try to load this we will likely receive an error due to an older package being loaded
# Restart the kernel! It will keep the installed libraries but will unload the offending package.
library(devtools)
# or
#library('devtools')
BiocManager()To install from Bioconductor you can use the package
BiocManager() to help pull down and install other packages
from the Bioconductor repository.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager") # this piece of code checks if BiocManager is installed.
# If is not installed, it will do it for you. It does nothing if BiocManager is already installed.
# If you run this, it could take a while
BiocManager::install("GenomicRanges")
## 'getOption("repos")' replaces Bioconductor standard repositories, see
## 'help("repositories", package = "BiocManager")' for details.
## Replacement repositories:
## CRAN: https://cran.r-project.org
## Bioconductor version 3.12 (BiocManager 1.30.22), R 4.0.5 (2021-03-31)
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'GenomicRanges'
## Installation paths not writeable, unable to update packages
## path: C:/Program Files/R/R-4.0.5/library
## packages:
## boot, class, cluster, codetools, crayon, evaluate, foreign, KernSmooth,
## lattice, mgcv, nlme, nnet, pbdZMQ, rpart, spatial, survival
## Old packages: 'abind', 'ade4', 'ape', 'backports', 'basefun', 'bdsmatrix',
## 'bench', 'BiocManager', 'bit', 'bit64', 'bitops', 'brew', 'brio', 'bslib',
## 'cachem', 'Cairo', 'callr', 'car', 'classInt', 'cli', 'clue', 'commonmark',
## 'corrplot', 'covr', 'cowplot', 'coxme', 'credentials', 'crosstalk', 'curl',
## 'data.table', 'DBI', 'dbplyr', 'dendextend', 'DEoptimR', 'desc', 'diffobj',
## 'digest', 'directlabels', 'downlit', 'dplyr', 'dreamerr', 'DT', 'e1071',
## 'fansi', 'fastmap', 'fixest', 'foghorn', 'fontawesome', 'fs', 'future',
## 'gapminder', 'gee', 'generics', 'geomtextpath', 'gert', 'ggExtra', 'ggforce',
## 'gghighlight', 'ggnewscale', 'ggpubr', 'ggsci', 'git2r', 'glmmTMB', 'glmnet',
## 'globals', 'glue', 'haven', 'highr', 'htmltools', 'htmlwidgets', 'httpuv',
## 'hunspell', 'ISwR', 'jpeg', 'jsonlite', 'knitr', 'Lahman', 'later', 'leaps',
## 'lintr', 'listenv', 'lme4', 'lubridate', 'maps', 'markdown', 'MatrixModels',
## 'matrixStats', 'mclust', 'mice', 'microbenchmark', 'mime', 'miniUI', 'minqa',
## 'mlt', 'mockery', 'mockr', 'modeltools', 'mratios', 'multcomp', 'mvtnorm',
## 'networkD3', 'nloptr', 'odbc', 'ordinal', 'packrat', 'pander', 'parallelly',
## 'parsedate', 'patchwork', 'pingr', 'pixmap', 'pkgbuild', 'pkgdown',
## 'pkgload', 'plyr', 'polyclip', 'prettyunits', 'processx', 'profmem',
## 'profvis', 'progress', 'promises', 'ps', 'quantreg', 'R.oo', 'R.utils',
## 'ragg', 'Rcpp', 'RcppArmadillo', 'RcppEigen', 'RcppTOML', 'RCurl', 'readr',
## 'readxl', 'remotes', 'renv', 'repr', 'reprex', 'reticulate', 'rhub', 'rjson',
## 'rlang', 'RMariaDB', 'rmarkdown', 'RMySQL', 'robustbase', 'roxygen2',
## 'RPostgres', 'RPostgreSQL', 'rprojroot', 'rsconnect', 'RSpectra', 'RSQLite',
## 'rstudioapi', 'rzmq', 's2', 'sandwich', 'sass', 'sessioninfo', 'sf', 'shiny',
## 'shinydashboard', 'showtext', 'SimComp', 'sp', 'SparseM', 'spatstat.data',
## 'spatstat.geom', 'spatstat.random', 'spatstat.utils', 'spelling', 'splancs',
## 'stringi', 'stringr', 'survPresmooth', 'sysfonts', 'systemfonts', 'testthat',
## 'textshaping', 'TH.data', 'tibble', 'tidyr', 'timechange', 'tinytex', 'TMB',
## 'tram', 'tzdb', 'ucminf', 'units', 'utf8', 'uuid', 'V8', 'variables',
## 'vctrs', 'vdiffr', 'vipor', 'viridis', 'vroom', 'waldo', 'webutils', 'wk',
## 'writexl', 'xfun', 'xml2', 'xopen', 'xts', 'yaml', 'zip', 'zoo'
#or
#BiocManager::install(c("GenomicRanges", "ConnectivityMap"))
package::function()As mentioned above in section 1.1.0,
devtools is required to install from GitHub. We don’t
actually need to load the entire library for devtools if we
are only going to use one function. We can select a function using this
syntax package::function().
Directly accessing functions Sometimes we load libraries that can contain the same function names! While these functions may behave completely differently, how does the R interpreter know which one we are referring to? By default, R will use the most recent version of a function loaded into memory. By using the package::function() syntax, we can let R know exactly which version of “conflicting” functions we wish to use!
devtools::install_github("tidyverse/googlesheets4")
## Using github PAT from envvar GITHUB_PAT
## Skipping install of 'googlesheets4' from a github remote, the SHA1 (55cd9fdb) has not changed since last install.
## Use `force = TRUE` to force installation
All packages are loaded the same regardless of their origin, using
library().
# Load googlesheets4 now from the library
library(googlesheets4)
The following packages are used in this lecture:
tidyverse (tidyverse installs several packages for you,
like dplyr, readr, readxl,
tibble, and ggplot2)writexl used for writing multiple datasets to excel
files#--------- Install packages to for today's session ----------#
#install.packages("tidyverse", dependencies = TRUE) # This package should already be installed on Jupyter Hub
# This package should NOT already be installed on the RStudio server
if(!require("writexl")) install.packages("writexl", dependencies = TRUE)
#--------- Load packages to for today's session ----------#
library(tidyverse)
# readxl, used for reading xlsx files, is installed with tidyverse but is not a core component when loading tidyverse
library(readxl)
library(writexl)
The most important thing when starting to work with your data is to know how to load it into the memory of the R kernel. There are a number of ways to read in files and each is suited to dealing with specific file types, file sizes or may perform better depending on how you wish to read/store the file (all at once, or a line at a time, or somewhere in between!
There are many file formats you may come across in your journey but the most common will be CSV (comma-separated values), TSV (tab-separated values), FASTQ (usually used for storing biological sequences), or some archived (ZIP, GZ, TGZ) version of these. R is even able to open these archived versions in their native format! We may interchangeably use the word parsing to describe the action of reading/importing formatted data files.
tibble with
read_csv()The tidyverse package has its own function for reading
in text files because the tibble structure was first
developed as part of the dplyr package! We’ll spend some
time learning more about the differences between the tibble
and data.frame objects in section 2.3.2.
Since we’ll be spending our time working with the
tidyverse, then we may as well use their commands for
importing files! If you want to learn how to do this with the base R
utils package, check out the Appendix section for
details.
Let’s look quickly at the read_csv() function which is a
specific version of the read_delim() function from the
readr package. The parameters we are interested in are:
file: The path to the file you want to importcol_names: TRUE (there is a header),
FALSE (import without column names), or supply a character
vector of custom names you want to use for your data columns.col_types: NULL (default) and decides on
column types itself, or a cols() specification of the data
type for each column. Find more information in the
?read_csv details.na: a character vector of strings to interpret as
NA values. Very handy when you have values you want to
identify and convert at import.From this point on, we’ll pretty much use the terms
tibble and data.frame interchangeably.
# ?read_csv
# Import our infection_meta.csv file from the data folder
infection_meta.tbl <- read_csv(file = "data/infection_meta.csv",
col_names = TRUE,
col_types = cols()
# Producing a blank cols() specification suppresses any read_csv() output
)
# Check out the structure of our table
str(infection_meta.tbl)
## spc_tbl_ [276 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
## $ experimenter : chr [1:276] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:276] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## $ Worm_strain : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
## $ Total Worms : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:276] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
## $ Total ul spore : num [1:276] 0 56.8 113.6 0 56.8 ...
## $ Infection Round : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:276] 0 0.354 0.708 0 0.354 ...
## $ Time plated : num [1:276] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time Incubated : num [1:276] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:276] "72" "72" "72" "72" ...
## $ infection.type : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:276] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:276] 190513 190513 190513 190430 190513 ...
## $ Stain type : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
## $ Slide date : num [1:276] 190515 190515 190515 190501 190515 ...
## $ Slide number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## $ Slide Box : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:276] 190516 190516 190516 190502 190516 ...
## - attr(*, "spec")=
## .. cols(
## .. experiment = [31mcol_character()[39m,
## .. experimenter = [31mcol_character()[39m,
## .. description = [31mcol_character()[39m,
## .. `Infection Date` = [32mcol_double()[39m,
## .. `Plate Number` = [32mcol_double()[39m,
## .. Worm_strain = [31mcol_character()[39m,
## .. `Total Worms` = [32mcol_double()[39m,
## .. `Spore Strain` = [31mcol_character()[39m,
## .. `Spore Lot` = [31mcol_character()[39m,
## .. `Lot concentration` = [32mcol_double()[39m,
## .. `Total Spores (M)` = [32mcol_double()[39m,
## .. `Total ul spore` = [32mcol_double()[39m,
## .. `Infection Round` = [32mcol_double()[39m,
## .. `40X OP50 (mL)` = [32mcol_double()[39m,
## .. `Plate Size` = [32mcol_double()[39m,
## .. `Spores(M)/cm2` = [32mcol_double()[39m,
## .. `Time plated` = [32mcol_double()[39m,
## .. `Time Incubated` = [32mcol_double()[39m,
## .. Temp = [32mcol_double()[39m,
## .. timepoint = [31mcol_character()[39m,
## .. infection.type = [31mcol_character()[39m,
## .. `Fixing Date` = [32mcol_double()[39m,
## .. Location = [31mcol_character()[39m,
## .. `Staining Date` = [32mcol_double()[39m,
## .. `Stain type` = [31mcol_character()[39m,
## .. `Slide date` = [32mcol_double()[39m,
## .. `Slide number` = [32mcol_double()[39m,
## .. `Slide Box` = [32mcol_double()[39m,
## .. `Imaging Date` = [32mcol_double()[39m
## .. )
## - attr(*, "problems")=<externalptr>
As you can see, it’s a pretty smooth process to parse simple text
files. We’ve imported our CSV file and can see it has 276 rows
(observations) and 29 columns (variables). In later sections we’ll learn
some additional functions for manipulating this data object as we become
familiar with the tidyverse package.
readxl packageWhat happens if we have an excel file? The readxl()
package, which is installed as part of the
tidyverse package, will recognize both xls and
xlsx files. It expects tabular data, which is what these
file types hold.
Note that back in section 1.5.0, we loaded the
tidyverse package and explicitly load readxl
so we can use the read_excel() function to accomplish our
task. Some parameters we are interested in are:
path: The path to the file you want to import.sheet: The sheet you want to read either as a string
(ie “sheet name”) or integer (position).col_names: TRUE (there is a header),
FALSE (import as is), or supply a character vector of
custom names you want to use for your data columns.col_types: NULL (default) and decides on
column types itself, or a character vector containing the column types
listed as “blank”, “numeric”, “date”, or “text”.na: a character vector of strings to interpret as
NA values. Very handy when you have values you want to
identify and convert at import.range: a way to specify a rectangular area to take data
from your excel file.First, let’s try to open our excel file with
read_csv().
# read_csv() doesn't work for excel files
head(read_csv("data/infection_data_all.xlsx"))
## Multiple files in zip: reading '[Content_Types].xml'
## [1mRows: [22m[34m1[39m [1mColumns: [22m[34m1[39m
## [36m--[39m [1mColumn specification[22m [36m-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------[39m
## [1mDelimiter:[22m ","
## [31mchr[39m (1): <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
##
## [36mi[39m Use `spec()` to retrieve the full column specification for this data.
## [36mi[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
## [90m# A tibble: 1 x 1[39m
## `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>`
## [3m[90m<chr>[39m[23m
## [90m1[39m [90m"[39m<Types xmlns=\"http://schemas.openxmlformats.org/package/2006/content-types\~
Looks like it didn’t work… There is a lot of file metadata that
exists with the actual data. If you could open this as a regular text
file you would see all that extra information as we see some of it now.
Therefore the .xlsx file cannot be imported correctly with
this function.
Now let’s try read_excel().
# The readxl package is not a core component of the tidyverse so we need to load it
require(readxl) # Note that we've already loaded it in section 1.5.0
# let's take a peek at what happens when we import without any extra arguments
head(read_excel("data/infection_data_all.xlsx"))
## [90m# A tibble: 6 x 29[39m
## experiment experimenter description `Infection Date` `Plate Number`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 1
## [90m2[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 2
## [90m3[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 3
## [90m4[39m 190426_N2_LUAm1_0M_7~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 4
## [90m5[39m 190426_N2_LUAm1_10M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 5
## [90m6[39m 190426_N2_LUAm1_20M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 6
## [90m# i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,[39m
## [90m# `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,[39m
## [90m# `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,[39m
## [90m# `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores/cm2` <dbl>,[39m
## [90m# `Time plated` <chr>, `Time Incubated` <chr>, Temp <dbl>, timepoint <chr>,[39m
## [90m# infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,[39m
## [90m# `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...[39m
excel_sheets()Why doesn’t our output look like a workbook with multiple sheets? The
read_excel() function defaults to reading in the
first worksheet. You can specify which sheet
you want to read in by position or name with the sheet
parameter.
How will you know what the sheet names are for your workbook? You can
see the name of your sheets using the excel_sheets()
function which returns a character vector of names as output.
# grab the excel sheet names
excel_sheets("data/infection_data_all.xlsx")
## [1] "infection_metadata" "embryo_data_wide" "microsporidia_info"
read_excel()
functionIf we want to get fancy, it is possible to subset from a sheet by
specifying cell numbers or ranges. Here we are grabbing sheet 1
(infection_metadata), and subsetting cells over a range
defined by two cells - A3:D9.
For our purposes, the read_excel() function takes the
default form of
read_excel(path, sheet = NULL, range = NULL) but there are
additional parameters we can supply to the function. See
?read_excel for more information.
# read in a specific sheet and range with read_excel()
read_excel(path = "data/infection_data_all.xlsx",
sheet = 1,
range = "A3:D9", )
## [90m# A tibble: 6 x 4[39m
## `190426_VC20019_LUAm1_10M_72hpi` CM `Wild isolate phenoMIP retest` `190423`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm1_20M_72hpi CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m2[39m 190426_N2_LUAm1_0M_72hpi CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m3[39m 190426_N2_LUAm1_10M_72hpi CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m4[39m 190426_N2_LUAm1_20M_72hpi CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m5[39m 190426_AB1_LUAm1_0M_72hpi CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m6[39m 190426_AB1_LUAm1_10M_72hpi CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
Caution: Note from our above example that we no
longer have proper column headings! Rather the column names have been
derived from the data existing in row A3. Normally, if
you had your column names in the first row, but wanted to jump to a
specific row for importing the data, you might include the
skip parameter. If you had a complex header of metadata
where your true table begins at a later point, then the
range parameter is more appropriate. If you simply wanted a
subset of the data, you might be better off importing most of
what you want and subsetting it from the dataframe after the columns are
named. There are many additional ways to subset your data but it really
depends on the level of complexity you wish to achieve with your
subsetting. Always try to choose the path of least resistance.
We could alternatively specify the sheet by name. Here we will also
look at how you would simply grab specific rows of data
using the cell_rows() helper function.
That’s right we can supply a function’s output as an argument to a parameter!
# read in an excel files by a specific row range
read_excel("data/infection_data_all.xlsx",
sheet = "infection_metadata",
range = cell_rows(1:9))
## [90m# A tibble: 8 x 29[39m
## experiment experimenter description `Infection Date` `Plate Number`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 1
## [90m2[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 2
## [90m3[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 3
## [90m4[39m 190426_N2_LUAm1_0M_7~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 4
## [90m5[39m 190426_N2_LUAm1_10M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 5
## [90m6[39m 190426_N2_LUAm1_20M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 6
## [90m7[39m 190426_AB1_LUAm1_0M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 7
## [90m8[39m 190426_AB1_LUAm1_10M~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 8
## [90m# i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,[39m
## [90m# `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,[39m
## [90m# `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,[39m
## [90m# `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores/cm2` <dbl>,[39m
## [90m# `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,[39m
## [90m# infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,[39m
## [90m# `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...[39m
Note that if your first row is the header, excluding this row will
result in data filling in the header unless you include the parameter
col_names = FALSE.
Likewise, how you would subset just columns from the
same sheet? We can use the cell_cols() helper function for
that.
# read in an excel files by a specific column range
head(read_excel(path = "data/infection_data_all.xlsx",
sheet = "infection_metadata",
range = cell_cols("B:D")))
## [90m# A tibble: 6 x 3[39m
## experimenter description `Infection Date`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m2[39m CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m3[39m CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m4[39m CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m5[39m CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
## [90m6[39m CM Wild isolate phenoMIP retest [4m1[24m[4m9[24m[4m0[24m423
Using the range parameter: to learn more about the range parameter and using it with a series of helper functions, you can visit the readxl section on the tidyverse page.
lapply() is the list version of
apply()How would we read in all of the sheets at once? In one solution you
can also use lapply(), a version of the
apply() function that we learned about in Lecture 01
(section 4.3.0), to read in all sheets at once.
lapply() uses as input the vector or list
X and returns a list object of
the same length as X. Each element of the returned list is
the result of applying FUN to the corresponding element of
X. Note that the elements of the returned list could be any
kind of object!
For our examples, we can use lapply() so that each sheet
from an xlsx file will be stored as a tibble inside of a
list object. Recall that apply() took in a
matrix-like object, a row/column specification (MARGIN),
and a function (FUN).
lapply(), instead, drops the MARGIN
parameter and takes in a vector or a list as the input. Remember that
lists are a single dimension and thus do not have a row/column
configuration. Basic parameters we require are:
X: A vector or list objectFUN: The function you wish to apply to each
element of X....: An unspecified number of additional parameters
that are passed on to FUN as arguments for its
parameters.So far we have been accustomed to functions finding our variables
globally (in the global environment), lapply() is looking
locally (within the function) and so we need to explicitly provide our
path. We will get more into local vs. global variables in our control
flow lesson (lecture 07). For now, just know we can
read in all worksheets from an excel workbook.
#?lapply
# Use lapply and provide a list of excel sheet names, then apply a function to each element (Sheet name) of the list!
excel_sheets_list <- lapply(X = excel_sheets("data/infection_data_all.xlsx"), # this will set X to a character vector
FUN = read_excel, # Note the lack of parentheses!
path = "data/infection_data_all.xlsx" # This is an argument for read_excel()
)
# What is the structure of our sheets_list?
str(excel_sheets_list)
## List of 3
## $ : tibble [276 x 29] (S3: tbl_df/tbl/data.frame)
## ..$ experiment : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
## ..$ experimenter : chr [1:276] "CM" "CM" "CM" "CM" ...
## ..$ description : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## ..$ Infection Date : num [1:276] 190423 190423 190423 190423 190423 ...
## ..$ Plate Number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Worm_strain : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
## ..$ Total Worms : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## ..$ Spore Strain : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## ..$ Spore Lot : chr [1:276] "2A" "2A" "2A" "2A" ...
## ..$ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## ..$ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
## ..$ Total ul spore : num [1:276] 0 56.8 113.6 0 56.8 ...
## ..$ Infection Round : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ 40X OP50 (mL) : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## ..$ Plate Size : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
## ..$ Spores/cm2 : num [1:276] 0 0.354 0.708 0 0.354 ...
## ..$ Time plated : chr [1:276] "1300" "1300" "1300" "1300" ...
## ..$ Time Incubated : chr [1:276] "1600" "1600" "1600" "1600" ...
## ..$ Temp : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
## ..$ timepoint : chr [1:276] "72hpi" "72hpi" "72hpi" "72hpi" ...
## ..$ infection.type : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
## ..$ Fixing Date : num [1:276] 190426 190426 190426 190426 190426 ...
## ..$ Location : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## ..$ Staining Date : num [1:276] 190513 190513 190513 190430 190513 ...
## ..$ Stain type : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
## ..$ Slide date : num [1:276] 190515 190515 190515 190501 190515 ...
## ..$ Slide number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Slide Box : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
## ..$ Imaging Date : num [1:276] 190516 190516 190516 190502 190516 ...
## $ : tibble [154 x 301] (S3: tbl_df/tbl/data.frame)
## ..$ worm.number : num [1:154] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ 200707_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_18" "0_0_18" "0_0_9" "0_0_15" ...
## ..$ 200707_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_7" "0_1_3" "0_1_10" "0_1_8" ...
## ..$ 200707_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_9" "0_0_13" "0_0_10" ...
## ..$ 200707_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200707_ED3052A_LUAm1_0M_72hpi : chr [1:154] "0_0_12" "0_0_11" "0_0_14" "0_0_11" ...
## ..$ 200707_ED3052A_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_1" "0_1_3" ...
## ..$ 200707_ED3052B_LUAm1_0M_72hpi : chr [1:154] "0_0_5" "0_0_12" "0_0_11" "0_0_9" ...
## ..$ 200707_ED3052B_LUAm1_10M_72hpi : chr [1:154] "0_1_9" "0_1_2" "0_1_4" "0_1_0" ...
## ..$ 200707_MY1_LUAm1_0M_72hpi : chr [1:154] "0_0_11" "0_0_9" "0_0_10" "0_0_11" ...
## ..$ 200707_MY1_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_1" "0_1_0" ...
## ..$ 200707_N2_MAM1_4M_72hpi : chr [1:154] "0_1_15" "0_1_18" "0_1_15" "0_1_13" ...
## ..$ 200707_JU1400_MAM1_4M_72hpi : chr [1:154] "1_1_2" "1_1_3" "0_0_8" "0_1_4" ...
## ..$ 200707_N2_LUAm3_10M_72hpi : chr [1:154] "0_1_27" "0_0_7" "0_1_16" "0_1_12" ...
## ..$ 200707_JU1400_LUAm3_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200707_N2_AWRm78_3.5M_72hpi : chr [1:154] "1_1_15" "1_1_4" "1_1_23" "1_1_18" ...
## ..$ 200707_JU1400_AWRm78_3.5M_72hpi : chr [1:154] "0_1_0" "0_0_0" "0_1_3" "1_1_4" ...
## ..$ 200707_N2_ERTm5_1.75M_72hpi : chr [1:154] "1_1_9" "0_1_13" "0_1_0" "1_1_8" ...
## ..$ 200707_N2_ERTm5_3.5M_72hpi : chr [1:154] "1_1_1" "1_1_4" "0_1_12" "1_1_13" ...
## ..$ 200707_JU1400_ERTm5_1.75M_72hpi : chr [1:154] "0_0_3" "0_0_3" "0_0_5" "0_1_8" ...
## ..$ 200707_JU1400_ERTm5_3.5M_72hpi : chr [1:154] "0_0_10" "0_0_0" "0_0_3" "0_0_5" ...
## ..$ 200707_MY1_ERTm5_1.75M_72hpi : chr [1:154] "0_1_10" "0_1_13" "0_1_14" "0_1_6" ...
## ..$ 200707_MY1_ERTm5_3.5M_72hpi : chr [1:154] "0_1_10" "1_1_0" "0_1_12" "1_1_2" ...
## ..$ 200714_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_11" "0_0_19" "0_0_13" "0_0_13" ...
## ..$ 200714_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_11" "0_1_9" "0_1_4" "0_1_10" ...
## ..$ 200714_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_8" "0_0_14" "0_0_4" "0_0_10" ...
## ..$ 200714_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_0_0" "0_1_0" "0_1_0" "0_0_0" ...
## ..$ 200714_ED3052A_LUAm1_0M_72hpi : chr [1:154] "0_0_8" "0_0_9" "0_0_6" "0_0_8" ...
## ..$ 200714_ED3052A_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_1" "0_1_0" "0_1_1" ...
## ..$ 200714_ED3052B_LUAm1_0M_72hpi : chr [1:154] "0_0_18" "0_0_7" "0_0_13" "0_0_20" ...
## ..$ 200714_ED3052B_LUAm1_10M_72hpi : chr [1:154] "0_1_5" "0_1_0" "0_1_3" "0_1_5" ...
## ..$ 200714_MY1_LUAm1_0M_72hpi : chr [1:154] "0_0_6" "0_0_10" "0_0_23" "0_0_13" ...
## ..$ 200714_MY1_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_N2_MAM1_4M_72hpi : chr [1:154] "0_1_10" "0_1_9" "0_1_18" "0_1_16" ...
## ..$ 200714_JU1400_MAM1_4M_72hpi : chr [1:154] "0_1_0" "0_1_4" "0_1_5" "0_0_9" ...
## ..$ 200714_N2_LUAm3_10M_72hpi : chr [1:154] "0_1_9" "0_1_12" "0_1_12" "0_1_15" ...
## ..$ 200714_JU1400_LUAm3_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_N2_AWRm78_3.5M_72hpi : chr [1:154] "1_1_9" "1_1_11" "1_1_11" "1_1_9" ...
## ..$ 200714_JU1400_AWRm78_3.5M_72hpi : chr [1:154] "0_1_11" "0_1_5" "0_1_4" "0_1_10" ...
## ..$ 200714_N2_ERTm5_1.75M_72hpi : chr [1:154] "1_1_16" "1_1_14" "1_1_15" "1_1_8" ...
## ..$ 200714_N2_ERTm5_3.5M_72hpi : chr [1:154] "1_1_9" "1_1_6" "1_1_11" "1_1_15" ...
## ..$ 200714_JU1400_ERTm5_1.75M_72hpi : chr [1:154] "0_0_10" "0_1_10" "0_1_8" "0_1_10" ...
## ..$ 200714_JU1400_ERTm5_3.5M_72hpi : chr [1:154] "0_0_3" "0_0_7" "0_0_4" "0_0_2" ...
## ..$ 200714_MY1_ERTm5_1.75M_72hpi : chr [1:154] "0_1_9" "1_1_4" "0_1_14" "0_1_16" ...
## ..$ 200714_MY1_ERTm5_3.5M_72hpi : chr [1:154] "0_1_9" "0_1_8" "1_1_15" "0_1_10" ...
## ..$ 200714_N2_LUAm1_15M_72hpi : chr [1:154] "NA" "0_1_15" "0_1_10" "0_1_3" ...
## ..$ 200714_JU1400_LUAm1_15M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_ED3052A_LUAm1_15M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "0_1_0" ...
## ..$ 200714_ED3052B_LUAm1_15M_72hpi : chr [1:154] "0_1_6" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_MY1_LUAm1_15M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_N2_MAM1_8M_72hpi : chr [1:154] "0_1_9" "0_1_9" "0_1_15" "0_1_0" ...
## ..$ 200714_JU1400_MAM1_8M_72hpi : chr [1:154] "0_1_6" "0_1_5" "0_1_4" "0_1_8" ...
## ..$ 200721_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_20" "0_0_19" "0_0_16" "0_0_6" ...
## ..$ 200721_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_10" "0_1_13" "0_1_11" "0_1_12" ...
## ..$ 200721_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_7" "0_0_0" "0_0_10" "0_0_10" ...
## ..$ 200721_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200721_ED3052A_LUAm1_0M_72hpi : chr [1:154] "0_0_12" "0_0_17" "0_0_13" "0_0_12" ...
## ..$ 200721_ED3052A_LUAm1_10M_72hpi : chr [1:154] "0_1_6" "0_1_9" "0_1_5" "0_1_10" ...
## ..$ 200721_ED3052B_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_12" "0_0_9" "0_0_8" ...
## ..$ 200721_ED3052B_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200721_MY1_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_11" "0_0_8" "0_0_0" ...
## ..$ 200721_MY1_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_3" "0_1_0" "0_1_0" ...
## ..$ 200721_N2_MAM1_4M_72hpi : chr [1:154] "0_1_21" "0_1_17" "0_1_17" "0_1_10" ...
## ..$ 200721_JU1400_MAM1_4M_72hpi : chr [1:154] "0_1_4" "0_0_7" "0_1_5" "1_1_7" ...
## ..$ 200721_N2_LUAm3_10M_72hpi : chr [1:154] "0_1_11" "0_1_12" "0_1_8" "0_1_15" ...
## ..$ 200721_JU1400_LUAm3_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200721_N2_AWRm78_3.5M_72hpi : chr [1:154] "1_1_4" "1_1_9" "1_1_12" "0_1_14" ...
## ..$ 200721_JU1400_AWRm78_3.5M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "1_1_7" ...
## ..$ 200721_N2_ERTm5_1.75M_72hpi : chr [1:154] "1_1_8" "1_1_12" "1_1_12" "0_1_16" ...
## ..$ 200721_N2_ERTm5_3.5M_72hpi : chr [1:154] "1_1_6" "1_1_4" "1_1_1" "1_1_13" ...
## ..$ 200721_JU1400_ERTm5_1.75M_72hpi : chr [1:154] "0_0_5" "0_0_11" "1_1_7" "0_0_0" ...
## ..$ 200721_JU1400_ERTm5_3.5M_72hpi : chr [1:154] "0_1_3" "0_0_1" "0_1_2" "0_0_0" ...
## ..$ 200721_MY1_ERTm5_1.75M_72hpi : chr [1:154] "1_1_15" "0_1_10" "1_1_20" "0_1_13" ...
## ..$ 200721_MY1_ERTm5_3.5M_72hpi : chr [1:154] "1_1_9" "0_1_6" "0_1_12" "0_1_12" ...
## ..$ 200721_N2_MAM1_8M_72hpi : chr [1:154] "1_1_16" "0_1_10" "0_1_18" "0_1_19" ...
## ..$ 200721_JU1400_MAM1_8M_72hpi : chr [1:154] "0_1_3" "0_1_0" "1_1_3" "1_1_0" ...
## ..$ 200821_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_12" "0_0_14" "0_0_24" "0_0_14" ...
## ..$ 200821_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_9" "0_1_13" "0_1_10" "0_1_14" ...
## ..$ 200821_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_15" "0_0_10" "0_0_11" "0_0_17" ...
## ..$ 200821_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_4" "0_1_0" "0_1_0" ...
## ..$ 200821_VC40171_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_11" "0_0_13" "0_0_9" ...
## ..$ 200821_VC40171_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200821_AWR144_LUAm1_0M_72hpi : chr [1:154] "0_0_23" "0_0_20" "0_0_22" "0_0_3" ...
## ..$ 200821_AWR144_LUAm1_10M_72hpi : chr [1:154] "0_1_9" "0_1_11" "0_1_9" "0_1_13" ...
## ..$ 200821_AWR145_LUAm1_0M_72hpi : chr [1:154] "0_0_30" "0_0_23" "0_0_24" "0_0_21" ...
## ..$ 200821_AWR145_LUAm1_10M_72hpi : chr [1:154] "0_1_2" "0_1_0" "0_1_12" "0_1_11" ...
## ..$ 200821_N2_LUAm1-HK_10M_72hpi : chr [1:154] "0_0_12" "0_0_23" "0_0_0" "0_0_15" ...
## ..$ 200821_JU1400_LUAm1-HK_10M_72hpi : chr [1:154] "0_0_14" "0_0_8" "0_0_15" "0_0_17" ...
## ..$ 200821_N2_LUAm1-sup_10M_72hpi : chr [1:154] "0_0_24" "0_0_24" "0_0_26" "0_0_21" ...
## ..$ 200821_JU1400_LUAm1-sup_10M_72hpi : chr [1:154] "0_0_13" "0_0_9" "0_0_11" "0_0_11" ...
## ..$ 200821_N2_LUAm1-pel_10M_72hpi : chr [1:154] "0_1_9" "0_1_11" "0_1_14" "0_1_15" ...
## ..$ 200821_JU1400_LUAm1-pel_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "0_1_1" ...
## ..$ 200825_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_26" "0_0_24" "0_0_23" "0_0_17" ...
## ..$ 200825_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_13" "0_1_12" "0_1_15" "0_1_17" ...
## ..$ 200825_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_12" "0_0_14" "0_0_14" ...
## ..$ 200825_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200825_VC40171_LUAm1_0M_72hpi : chr [1:154] "0_0_8" "0_0_11" "0_0_11" "0_0_5" ...
## ..$ 200825_VC40171_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200825_AWR144_LUAm1_0M_72hpi : chr [1:154] "0_0_14" "0_0_27" "0_0_14" "0_0_15" ...
## .. [list output truncated]
## $ : tibble [7 x 5] (S3: tbl_df/tbl/data.frame)
## ..$ spore.strain : chr [1:7] "ERTm1" "ERTm2" "ERTm5" "LUAm1" ...
## ..$ spore.species : chr [1:7] "N. parisii" "N. ausubeli" "N. ironsii" "N. ferruginous" ...
## ..$ infection.location : chr [1:7] "intestine" "intestine" "intestine" "epidermis" ...
## ..$ original.nematode.species: chr [1:7] "C. elegans" "C. briggsae" "C. briggsae" "C. elegans" ...
## ..$ original.location : chr [1:7] "France" "India" "Hawaii, USA" "France" ...
It’s a lot of output but if we look carefully we can see an unnamed
list of 3 elements with each being a tibble
object.
lapply()Remember the parameters of
`read_excel(path, sheet = NULL, range = NULL)`
Notice that the second position parameter is sheet. In
our lapply() function assignment we didn’t specifically
name that parameter! Recall we used:
lapply(X= excel_sheets("data/miscellaneous.xlsx"), FUN = read_excel, path = "data/miscellaneous.xlsx")
and thus explicitly named our first parameter path. The
next available parameter by default order was sheet to
which the elements of X were automatically applied. We now have a list
object with each worksheet being one item in the list.
If we wanted to explicitly name our sheets in our function definition
we would need to explicitly define our function in the FUN
parameter. While we won’t learn about defining functions until
lecture 07, you should be familiar with this idea from
lecture 01 (section 4.3.2). In this case, you could use
the following code:
# You can define your function directly with FUN = function(x)
str(lapply(X = excel_sheets("data/infection_data_all.xlsx"), # this will set X to a character vector
FUN = function(x) read_excel(path = "data/infection_data_all.xlsx",
sheet = x)
) # End of lapply
) # end of str
## List of 3
## $ : tibble [276 x 29] (S3: tbl_df/tbl/data.frame)
## ..$ experiment : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
## ..$ experimenter : chr [1:276] "CM" "CM" "CM" "CM" ...
## ..$ description : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## ..$ Infection Date : num [1:276] 190423 190423 190423 190423 190423 ...
## ..$ Plate Number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Worm_strain : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
## ..$ Total Worms : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## ..$ Spore Strain : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## ..$ Spore Lot : chr [1:276] "2A" "2A" "2A" "2A" ...
## ..$ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## ..$ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
## ..$ Total ul spore : num [1:276] 0 56.8 113.6 0 56.8 ...
## ..$ Infection Round : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ 40X OP50 (mL) : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## ..$ Plate Size : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
## ..$ Spores/cm2 : num [1:276] 0 0.354 0.708 0 0.354 ...
## ..$ Time plated : chr [1:276] "1300" "1300" "1300" "1300" ...
## ..$ Time Incubated : chr [1:276] "1600" "1600" "1600" "1600" ...
## ..$ Temp : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
## ..$ timepoint : chr [1:276] "72hpi" "72hpi" "72hpi" "72hpi" ...
## ..$ infection.type : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
## ..$ Fixing Date : num [1:276] 190426 190426 190426 190426 190426 ...
## ..$ Location : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## ..$ Staining Date : num [1:276] 190513 190513 190513 190430 190513 ...
## ..$ Stain type : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
## ..$ Slide date : num [1:276] 190515 190515 190515 190501 190515 ...
## ..$ Slide number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Slide Box : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
## ..$ Imaging Date : num [1:276] 190516 190516 190516 190502 190516 ...
## $ : tibble [154 x 301] (S3: tbl_df/tbl/data.frame)
## ..$ worm.number : num [1:154] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ 200707_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_18" "0_0_18" "0_0_9" "0_0_15" ...
## ..$ 200707_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_7" "0_1_3" "0_1_10" "0_1_8" ...
## ..$ 200707_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_9" "0_0_13" "0_0_10" ...
## ..$ 200707_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200707_ED3052A_LUAm1_0M_72hpi : chr [1:154] "0_0_12" "0_0_11" "0_0_14" "0_0_11" ...
## ..$ 200707_ED3052A_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_1" "0_1_3" ...
## ..$ 200707_ED3052B_LUAm1_0M_72hpi : chr [1:154] "0_0_5" "0_0_12" "0_0_11" "0_0_9" ...
## ..$ 200707_ED3052B_LUAm1_10M_72hpi : chr [1:154] "0_1_9" "0_1_2" "0_1_4" "0_1_0" ...
## ..$ 200707_MY1_LUAm1_0M_72hpi : chr [1:154] "0_0_11" "0_0_9" "0_0_10" "0_0_11" ...
## ..$ 200707_MY1_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_1" "0_1_0" ...
## ..$ 200707_N2_MAM1_4M_72hpi : chr [1:154] "0_1_15" "0_1_18" "0_1_15" "0_1_13" ...
## ..$ 200707_JU1400_MAM1_4M_72hpi : chr [1:154] "1_1_2" "1_1_3" "0_0_8" "0_1_4" ...
## ..$ 200707_N2_LUAm3_10M_72hpi : chr [1:154] "0_1_27" "0_0_7" "0_1_16" "0_1_12" ...
## ..$ 200707_JU1400_LUAm3_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200707_N2_AWRm78_3.5M_72hpi : chr [1:154] "1_1_15" "1_1_4" "1_1_23" "1_1_18" ...
## ..$ 200707_JU1400_AWRm78_3.5M_72hpi : chr [1:154] "0_1_0" "0_0_0" "0_1_3" "1_1_4" ...
## ..$ 200707_N2_ERTm5_1.75M_72hpi : chr [1:154] "1_1_9" "0_1_13" "0_1_0" "1_1_8" ...
## ..$ 200707_N2_ERTm5_3.5M_72hpi : chr [1:154] "1_1_1" "1_1_4" "0_1_12" "1_1_13" ...
## ..$ 200707_JU1400_ERTm5_1.75M_72hpi : chr [1:154] "0_0_3" "0_0_3" "0_0_5" "0_1_8" ...
## ..$ 200707_JU1400_ERTm5_3.5M_72hpi : chr [1:154] "0_0_10" "0_0_0" "0_0_3" "0_0_5" ...
## ..$ 200707_MY1_ERTm5_1.75M_72hpi : chr [1:154] "0_1_10" "0_1_13" "0_1_14" "0_1_6" ...
## ..$ 200707_MY1_ERTm5_3.5M_72hpi : chr [1:154] "0_1_10" "1_1_0" "0_1_12" "1_1_2" ...
## ..$ 200714_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_11" "0_0_19" "0_0_13" "0_0_13" ...
## ..$ 200714_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_11" "0_1_9" "0_1_4" "0_1_10" ...
## ..$ 200714_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_8" "0_0_14" "0_0_4" "0_0_10" ...
## ..$ 200714_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_0_0" "0_1_0" "0_1_0" "0_0_0" ...
## ..$ 200714_ED3052A_LUAm1_0M_72hpi : chr [1:154] "0_0_8" "0_0_9" "0_0_6" "0_0_8" ...
## ..$ 200714_ED3052A_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_1" "0_1_0" "0_1_1" ...
## ..$ 200714_ED3052B_LUAm1_0M_72hpi : chr [1:154] "0_0_18" "0_0_7" "0_0_13" "0_0_20" ...
## ..$ 200714_ED3052B_LUAm1_10M_72hpi : chr [1:154] "0_1_5" "0_1_0" "0_1_3" "0_1_5" ...
## ..$ 200714_MY1_LUAm1_0M_72hpi : chr [1:154] "0_0_6" "0_0_10" "0_0_23" "0_0_13" ...
## ..$ 200714_MY1_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_N2_MAM1_4M_72hpi : chr [1:154] "0_1_10" "0_1_9" "0_1_18" "0_1_16" ...
## ..$ 200714_JU1400_MAM1_4M_72hpi : chr [1:154] "0_1_0" "0_1_4" "0_1_5" "0_0_9" ...
## ..$ 200714_N2_LUAm3_10M_72hpi : chr [1:154] "0_1_9" "0_1_12" "0_1_12" "0_1_15" ...
## ..$ 200714_JU1400_LUAm3_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_N2_AWRm78_3.5M_72hpi : chr [1:154] "1_1_9" "1_1_11" "1_1_11" "1_1_9" ...
## ..$ 200714_JU1400_AWRm78_3.5M_72hpi : chr [1:154] "0_1_11" "0_1_5" "0_1_4" "0_1_10" ...
## ..$ 200714_N2_ERTm5_1.75M_72hpi : chr [1:154] "1_1_16" "1_1_14" "1_1_15" "1_1_8" ...
## ..$ 200714_N2_ERTm5_3.5M_72hpi : chr [1:154] "1_1_9" "1_1_6" "1_1_11" "1_1_15" ...
## ..$ 200714_JU1400_ERTm5_1.75M_72hpi : chr [1:154] "0_0_10" "0_1_10" "0_1_8" "0_1_10" ...
## ..$ 200714_JU1400_ERTm5_3.5M_72hpi : chr [1:154] "0_0_3" "0_0_7" "0_0_4" "0_0_2" ...
## ..$ 200714_MY1_ERTm5_1.75M_72hpi : chr [1:154] "0_1_9" "1_1_4" "0_1_14" "0_1_16" ...
## ..$ 200714_MY1_ERTm5_3.5M_72hpi : chr [1:154] "0_1_9" "0_1_8" "1_1_15" "0_1_10" ...
## ..$ 200714_N2_LUAm1_15M_72hpi : chr [1:154] "NA" "0_1_15" "0_1_10" "0_1_3" ...
## ..$ 200714_JU1400_LUAm1_15M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_ED3052A_LUAm1_15M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "0_1_0" ...
## ..$ 200714_ED3052B_LUAm1_15M_72hpi : chr [1:154] "0_1_6" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_MY1_LUAm1_15M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200714_N2_MAM1_8M_72hpi : chr [1:154] "0_1_9" "0_1_9" "0_1_15" "0_1_0" ...
## ..$ 200714_JU1400_MAM1_8M_72hpi : chr [1:154] "0_1_6" "0_1_5" "0_1_4" "0_1_8" ...
## ..$ 200721_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_20" "0_0_19" "0_0_16" "0_0_6" ...
## ..$ 200721_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_10" "0_1_13" "0_1_11" "0_1_12" ...
## ..$ 200721_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_7" "0_0_0" "0_0_10" "0_0_10" ...
## ..$ 200721_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200721_ED3052A_LUAm1_0M_72hpi : chr [1:154] "0_0_12" "0_0_17" "0_0_13" "0_0_12" ...
## ..$ 200721_ED3052A_LUAm1_10M_72hpi : chr [1:154] "0_1_6" "0_1_9" "0_1_5" "0_1_10" ...
## ..$ 200721_ED3052B_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_12" "0_0_9" "0_0_8" ...
## ..$ 200721_ED3052B_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200721_MY1_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_11" "0_0_8" "0_0_0" ...
## ..$ 200721_MY1_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_3" "0_1_0" "0_1_0" ...
## ..$ 200721_N2_MAM1_4M_72hpi : chr [1:154] "0_1_21" "0_1_17" "0_1_17" "0_1_10" ...
## ..$ 200721_JU1400_MAM1_4M_72hpi : chr [1:154] "0_1_4" "0_0_7" "0_1_5" "1_1_7" ...
## ..$ 200721_N2_LUAm3_10M_72hpi : chr [1:154] "0_1_11" "0_1_12" "0_1_8" "0_1_15" ...
## ..$ 200721_JU1400_LUAm3_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200721_N2_AWRm78_3.5M_72hpi : chr [1:154] "1_1_4" "1_1_9" "1_1_12" "0_1_14" ...
## ..$ 200721_JU1400_AWRm78_3.5M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "1_1_7" ...
## ..$ 200721_N2_ERTm5_1.75M_72hpi : chr [1:154] "1_1_8" "1_1_12" "1_1_12" "0_1_16" ...
## ..$ 200721_N2_ERTm5_3.5M_72hpi : chr [1:154] "1_1_6" "1_1_4" "1_1_1" "1_1_13" ...
## ..$ 200721_JU1400_ERTm5_1.75M_72hpi : chr [1:154] "0_0_5" "0_0_11" "1_1_7" "0_0_0" ...
## ..$ 200721_JU1400_ERTm5_3.5M_72hpi : chr [1:154] "0_1_3" "0_0_1" "0_1_2" "0_0_0" ...
## ..$ 200721_MY1_ERTm5_1.75M_72hpi : chr [1:154] "1_1_15" "0_1_10" "1_1_20" "0_1_13" ...
## ..$ 200721_MY1_ERTm5_3.5M_72hpi : chr [1:154] "1_1_9" "0_1_6" "0_1_12" "0_1_12" ...
## ..$ 200721_N2_MAM1_8M_72hpi : chr [1:154] "1_1_16" "0_1_10" "0_1_18" "0_1_19" ...
## ..$ 200721_JU1400_MAM1_8M_72hpi : chr [1:154] "0_1_3" "0_1_0" "1_1_3" "1_1_0" ...
## ..$ 200821_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_12" "0_0_14" "0_0_24" "0_0_14" ...
## ..$ 200821_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_9" "0_1_13" "0_1_10" "0_1_14" ...
## ..$ 200821_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_15" "0_0_10" "0_0_11" "0_0_17" ...
## ..$ 200821_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_4" "0_1_0" "0_1_0" ...
## ..$ 200821_VC40171_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_11" "0_0_13" "0_0_9" ...
## ..$ 200821_VC40171_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200821_AWR144_LUAm1_0M_72hpi : chr [1:154] "0_0_23" "0_0_20" "0_0_22" "0_0_3" ...
## ..$ 200821_AWR144_LUAm1_10M_72hpi : chr [1:154] "0_1_9" "0_1_11" "0_1_9" "0_1_13" ...
## ..$ 200821_AWR145_LUAm1_0M_72hpi : chr [1:154] "0_0_30" "0_0_23" "0_0_24" "0_0_21" ...
## ..$ 200821_AWR145_LUAm1_10M_72hpi : chr [1:154] "0_1_2" "0_1_0" "0_1_12" "0_1_11" ...
## ..$ 200821_N2_LUAm1-HK_10M_72hpi : chr [1:154] "0_0_12" "0_0_23" "0_0_0" "0_0_15" ...
## ..$ 200821_JU1400_LUAm1-HK_10M_72hpi : chr [1:154] "0_0_14" "0_0_8" "0_0_15" "0_0_17" ...
## ..$ 200821_N2_LUAm1-sup_10M_72hpi : chr [1:154] "0_0_24" "0_0_24" "0_0_26" "0_0_21" ...
## ..$ 200821_JU1400_LUAm1-sup_10M_72hpi : chr [1:154] "0_0_13" "0_0_9" "0_0_11" "0_0_11" ...
## ..$ 200821_N2_LUAm1-pel_10M_72hpi : chr [1:154] "0_1_9" "0_1_11" "0_1_14" "0_1_15" ...
## ..$ 200821_JU1400_LUAm1-pel_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "0_1_1" ...
## ..$ 200825_N2_LUAm1_0M_72hpi : chr [1:154] "0_0_26" "0_0_24" "0_0_23" "0_0_17" ...
## ..$ 200825_N2_LUAm1_10M_72hpi : chr [1:154] "0_1_13" "0_1_12" "0_1_15" "0_1_17" ...
## ..$ 200825_JU1400_LUAm1_0M_72hpi : chr [1:154] "0_0_10" "0_0_12" "0_0_14" "0_0_14" ...
## ..$ 200825_JU1400_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200825_VC40171_LUAm1_0M_72hpi : chr [1:154] "0_0_8" "0_0_11" "0_0_11" "0_0_5" ...
## ..$ 200825_VC40171_LUAm1_10M_72hpi : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
## ..$ 200825_AWR144_LUAm1_0M_72hpi : chr [1:154] "0_0_14" "0_0_27" "0_0_14" "0_0_15" ...
## .. [list output truncated]
## $ : tibble [7 x 5] (S3: tbl_df/tbl/data.frame)
## ..$ spore.strain : chr [1:7] "ERTm1" "ERTm2" "ERTm5" "LUAm1" ...
## ..$ spore.species : chr [1:7] "N. parisii" "N. ausubeli" "N. ironsii" "N. ferruginous" ...
## ..$ infection.location : chr [1:7] "intestine" "intestine" "intestine" "epidermis" ...
## ..$ original.nematode.species: chr [1:7] "C. elegans" "C. briggsae" "C. briggsae" "C. elegans" ...
## ..$ original.location : chr [1:7] "France" "India" "Hawaii, USA" "France" ...
Remember, that with the list that we generate, you can index the
tibble you would like to work with using the syntax
list[[x]] and store it as a variable using leftward
assignment.
Working with lists of data.frames (or tibbles): can be cumbersome but applying multiple procedures to these objects can be made easier with the purr package which extends the abilities of R to associate and run functions on elements from a list.
# You can see the structure of our first list element.
# Remember the difference between [[]] and []?
str(excel_sheets_list[[1]])
## tibble [276 x 29] (S3: tbl_df/tbl/data.frame)
## $ experiment : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
## $ experimenter : chr [1:276] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:276] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## $ Worm_strain : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
## $ Total Worms : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:276] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
## $ Total ul spore : num [1:276] 0 56.8 113.6 0 56.8 ...
## $ Infection Round : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores/cm2 : num [1:276] 0 0.354 0.708 0 0.354 ...
## $ Time plated : chr [1:276] "1300" "1300" "1300" "1300" ...
## $ Time Incubated : chr [1:276] "1600" "1600" "1600" "1600" ...
## $ Temp : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:276] "72hpi" "72hpi" "72hpi" "72hpi" ...
## $ infection.type : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:276] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:276] 190513 190513 190513 190430 190513 ...
## $ Stain type : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
## $ Slide date : num [1:276] 190515 190515 190515 190501 190515 ...
## $ Slide number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## $ Slide Box : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:276] 190516 190516 190516 190502 190516 ...
tibble is essentially a
data.frameNotice that the object type of our imported sheet isn’t
exactly a data.frame. Rather it is a
tibble which is an extended version of the
data.frame. Overall a tibble replicates the
same behaviours as a data.frame except when
printing/displaying (only outputs the first 10 rows vs. all) and in how
we subset a single column. As long as you use methods from within the
tidyverse, this construct will work just fine.
Subsetting a tibble using the index notation [, 1] returns a tibble object containing the first column of your data. In a data.frame, this same notation would return a vector object. This can sometimes cause type-errors when working with older functions or packages outside the tidyverse. If you want to retrieve a column vector from a tibble object, you can use the $ indexing notation or the dplyr::pull() function.
If you’d like to exclusively work with a data.frame, you
can cast it using the as.data.frame() command.
# Pull a single column from our tibble
print("Indexing a column from a tibble is still a tibble")
## [1] "Indexing a column from a tibble is still a tibble"
str(excel_sheets_list[[1]][,1])
## tibble [276 x 1] (S3: tbl_df/tbl/data.frame)
## $ experiment: chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
# Index a column with the $ notation but you need to know the name of your column
cat("\n") # print a blank line
print("Indexing a column into a vector with $")
## [1] "Indexing a column into a vector with $"
str(excel_sheets_list[[1]]$experiment)
## chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" ...
# Index a column with the pull() function if you know it's position or name
cat("\n")
print("Indexing a column into a vector with pull()")
## [1] "Indexing a column into a vector with pull()"
str(pull(excel_sheets_list[[1]], 1))
## chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" ...
# Cast the tibble to a data.frame and then pull a single column
cat("\n")
print("Indexing a column from a data.frame becomes a vector")
## [1] "Indexing a column from a data.frame becomes a vector"
str(data.frame(excel_sheets_list[[1]])[,1])
## chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" ...
At this point, we would like to just use our imported excel worksheet
as a normal data.frame in R. We’ll assign it to a new
variable metadata_sheet.df using the correct indexing
notation.
If you are a googlesheets person, you can use the package we installed (surprisingly called ‘googlesheets4’) in section 1.4.0 that will allow you to get your worksheets in and out of R. For more information on googlesheets, checkout more at the tidyverse/googlesheets4 page
# Let's assign our first sheet to it's own variable
metadata_sheet.df <- as.data.frame(excel_sheets_list[[1]])
str(metadata_sheet.df)
## 'data.frame': 276 obs. of 29 variables:
## $ experiment : chr "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
## $ experimenter : chr "CM" "CM" "CM" "CM" ...
## $ description : chr "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num 190423 190423 190423 190423 190423 ...
## $ Plate Number : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Worm_strain : chr "VC20019" "VC20019" "VC20019" "N2" ...
## $ Total Worms : num 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num 0 10 20 0 10 20 0 10 20 0 ...
## $ Total ul spore : num 0 56.8 113.6 0 56.8 ...
## $ Infection Round : num 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores/cm2 : num 0 0.354 0.708 0 0.354 ...
## $ Time plated : chr "1300" "1300" "1300" "1300" ...
## $ Time Incubated : chr "1600" "1600" "1600" "1600" ...
## $ Temp : num 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr "72hpi" "72hpi" "72hpi" "72hpi" ...
## $ infection.type : chr "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num 190426 190426 190426 190426 190426 ...
## $ Location : chr "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num 190513 190513 190513 190430 190513 ...
## $ Stain type : chr "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
## $ Slide date : num 190515 190515 190515 190501 190515 ...
## $ Slide number : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Slide Box : num 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num 190516 190516 190516 190502 190516 ...
What’s the difference between data.frame() and as.data.frame()? Without getting bogged down in the details there is a distinction when using the data.frame() and as.data.frame() functions. The former can be used to create a data.frame from scratch. As we saw in Lecture 01, you can provide one or more vectors of the same length to produce a data.frame object. On the other hand, if you want to convert a data.frame-like object (ie a matrix, tibble or array) to a data.frame, you could use data.frame() BUT it is slightly slower than as.data.frame() which is specifically designed to accept a single argument to be converted into a data.frame.
Comprehension question 2.0.0: Compare the structure information for the tibble version of our imported data in section 2.3.1 versus the above data frame version of the data. What differences do you notice about the columns? Name some other differences between a tibble and a data frame.
Image courtesy of xkcd at https://xkcd.com/2054/
We’ll often make assumptions about our datasets, like all of the values for a variable are within a certain range, or all positive. We also usually assume that all of the entries in our data are complete - no missing values or incorrect categories. This can be a bit of a trap - especially in large datasets where we cannot view it all by eye. Here we’ll discuss some helpful tools for inspecting your data before you start using more complex code for it.
When first importing data (especially from outside sources) it is best to inspect it for problems like missing values, inconsistent formatting, special characters, etc. Here, we’ll inspect our dataset, store it in a variable, and check out the structure by reviewing some helpful commands:
class() to quickly determine the object type. You see
this information in the str() command too.head() to quickly view just the
first n rows of your data.tail() to quickly view just the
last n rows of your data.unique() to quickly view the unique values in a vector
or similar data structure.glimpse() and View() (in RStudio) to take
a peek at your data structures.head() to view the first portion of your
dataYou can take a look at the first few rows (6 by default) of your
data.frame using the head() function. In fact you can play
with the parameters to pull a specific number of rows or lines from the
start of your data.frame or other object.
# Re-import our infection_meta.csv file from the data folder if you need to
# infection_meta.tbl <- read_csv(file = "data/infection_meta.csv", col_names = TRUE, col_types = cols)
# Use default head() parameters
head(infection_meta.tbl)
## [90m# A tibble: 6 x 29[39m
## experiment experimenter description `Infection Date` `Plate Number`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 1
## [90m2[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 2
## [90m3[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 3
## [90m4[39m 190426_N2_LUAm1_0M_7~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 4
## [90m5[39m 190426_N2_LUAm1_10M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 5
## [90m6[39m 190426_N2_LUAm1_20M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 6
## [90m# i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,[39m
## [90m# `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,[39m
## [90m# `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,[39m
## [90m# `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,[39m
## [90m# `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,[39m
## [90m# infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,[39m
## [90m# `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...[39m
# Pull just the first 3 rows
head(infection_meta.tbl, 3)
## [90m# A tibble: 3 x 29[39m
## experiment experimenter description `Infection Date` `Plate Number`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 1
## [90m2[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 2
## [90m3[39m 190426_VC20019_LUAm1~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 3
## [90m# i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,[39m
## [90m# `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,[39m
## [90m# `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,[39m
## [90m# `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,[39m
## [90m# `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,[39m
## [90m# infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,[39m
## [90m# `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...[39m
tail() to view the latter portion of your
dataLikewise, to inspect the last rows, you can use the
tail() function. Again, you can decide on how many rows
from the end of your object that you’d like to see. Note that this still
displays in the original order of the data frame rather than
reverse.
# Let's pull up the last 10 rows to look at!
tail(infection_meta.tbl, 10)
## [90m# A tibble: 10 x 29[39m
## experiment experimenter description `Infection Date` `Plate Number`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m 1[39m 200916_JU1400_ERTm5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m912 13
## [90m 2[39m 200916_JU1400_ERTm5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m912 14
## [90m 3[39m 200918_N2_ERTm5_0M_~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m915 1
## [90m 4[39m 200918_N2_ERTm5_3.5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m915 2
## [90m 5[39m 200918_JU1400_ERTm5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m915 3
## [90m 6[39m 200918_JU1400_ERTm5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m915 4
## [90m 7[39m 200918_AWR144_ERTm5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m915 5
## [90m 8[39m 200918_AWR144_ERTm5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m915 6
## [90m 9[39m 200918_AWR145_ERTm5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m915 7
## [90m10[39m 200918_AWR145_ERTm5~ CM NIL tests ~ [4m2[24m[4m0[24m[4m0[24m915 8
## [90m# i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,[39m
## [90m# `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,[39m
## [90m# `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,[39m
## [90m# `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,[39m
## [90m# `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,[39m
## [90m# infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,[39m
## [90m# `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...[39m
unique() to retrieve a list of the unique
elements within an objectYou may be interested in knowing more about the data set you’re
working with such as “How many C. elegans strains or
microsporidia strains are we working with across these experiments?
Recall that we have columns labeled Worm_strain and
Spore Strain within our data set. Don’t worry, we’ll learn
more about simplifying our column names later!
You could extract the whole column and scan through it or look at just a portion of it.
# Recall: Use the $ sign to access named columns within your data.frame!
infection_meta.tbl$Worm_strain
## [1] "VC20019" "VC20019" "VC20019" "N2" "N2"
## [6] "N2" "AB1" "AB1" "AB1" "JU397"
## [11] "JU397" "JU397" "JU642" "JU642" "JU642"
## [16] "MY6" "MY6" "MY6" "ED3042" "ED3042"
## [21] "ED3042" "JU360" "JU360" "JU360" "JU1400"
## [26] "JU1400" "JU1400" "MY1" "MY1" "MY1"
## [31] "Lua1" "Lua1" "Lua1" "VC20019" "VC20019"
## [36] "VC20019" "N2" "N2" "N2" "CB4856"
## [41] "CB4856" "CB4856" "JU300" "JU300" "JU300"
## [46] "JU1400" "JU1400" "JU1400" "MY2" "MY2"
## [51] "MY2" "VC20019" "VC20019" "VC20019" "N2"
## [56] "N2" "N2" "JU360" "JU360" "JU360"
## [61] "MY2" "MY2" "MY2" "N2" "Lua1"
## [66] "JU1400" "N2" "Lua1" "JU1400" "N2"
## [71] "Lua1" "JU1400" "N2" "Lua1" "JU1400"
## [76] "N2" "N2" "JU1400" "N2" "JU1400"
## [81] "N2" "JU1400" "N2" "JU1400" "N2"
## [86] "JU1400" "N2" "JU1400" "N2" "JU1400"
## [91] "N2" "JU1400" "VC20019" "VC20019" "JU1400"
## [96] "VC20019" "JU1400" "VC20019" "JU1400" "VC20019"
## [101] "JU1400" "VC20019" "JU1400" "VC20019" "JU1400"
## [106] "VC20019" "JU1400" "VC20019" "JU1400" "VC20019"
## [111] "JU1400" "N2" "N2" "JU1400" "JU1400"
## [116] "ED3052A" "ED3052A" "ED3052B" "ED3052B" "MY1"
## [121] "MY1" "N2" "JU1400" "N2" "JU1400"
## [126] "N2" "JU1400" "N2" "N2" "JU1400"
## [131] "JU1400" "MY1" "MY1" "N2" "N2"
## [136] "JU1400" "JU1400" "ED3052A" "ED3052A" "ED3052B"
## [141] "ED3052B" "MY1" "MY1" "N2" "JU1400"
## [146] "N2" "JU1400" "N2" "JU1400" "N2"
## [151] "N2" "JU1400" "JU1400" "MY1" "MY1"
## [156] "N2" "JU1400" "ED3052A" "ED3052B" "MY1"
## [161] "N2" "JU1400" "N2" "N2" "JU1400"
## [166] "JU1400" "ED3052A" "ED3052A" "ED3052B" "ED3052B"
## [171] "MY1" "MY1" "N2" "JU1400" "N2"
## [176] "JU1400" "N2" "JU1400" "N2" "N2"
## [181] "JU1400" "JU1400" "MY1" "MY1" "N2"
## [186] "JU1400" "N2" "N2" "JU1400" "JU1400"
## [191] "VC40171" "VC40171" "AWR144" "AWR144" "AWR145"
## [196] "AWR145" "N2" "JU1400" "N2" "JU1400"
## [201] "N2" "JU1400" "N2" "JU1400" "N2"
## [206] "JU1400" "N2" "JU1400" "N2" "JU1400"
## [211] "N2" "N2" "JU1400" "JU1400" "VC40171"
## [216] "VC40171" "AWR144" "AWR144" "AWR145" "AWR145"
## [221] "N2-rep1" "JU1400-rep1" "N2-rep1" "JU1400-rep1" "N2-rep1"
## [226] "JU1400-rep1" "N2-rep1" "JU1400-rep1" "N2" "N2"
## [231] "JU1400" "JU1400" "VC40171" "VC40171" "AWR144"
## [236] "AWR144" "AWR145" "AWR145" "N2" "JU1400"
## [241] "AWR144" "AWR145" "N2" "N2" "N2"
## [246] "JU1400" "JU1400" "JU1400" "N2" "N2"
## [251] "JU1400" "JU1400" "N2" "JU1400" "N2"
## [256] "N2" "JU1400" "JU1400" "AWR144" "AWR144"
## [261] "AWR145" "AWR145" "N2" "N2" "N2"
## [266] "JU1400" "JU1400" "JU1400" "N2" "N2"
## [271] "JU1400" "JU1400" "AWR144" "AWR144" "AWR145"
## [276] "AWR145"
As you may have noticed, this method printed the
entire Worm_strain column. While it may be
useful information for certain aspects, it doesn’t answer our main
question of how many different nematode strains were used
across our experiments.
The function unique() can help us answer this question
by removing duplicated entries, thus living up to its name. It can take
in a number of different objects but usually returns an object of the
same type that it was given as input.
Let’s take a look at using it on our question.
# Retrieve a list of unique genera from our data set
unique(infection_meta.tbl$Worm_strain)
## [1] "VC20019" "N2" "AB1" "JU397" "JU642"
## [6] "MY6" "ED3042" "JU360" "JU1400" "MY1"
## [11] "Lua1" "CB4856" "JU300" "MY2" "ED3052A"
## [16] "ED3052B" "VC40171" "AWR144" "AWR145" "N2-rep1"
## [21] "JU1400-rep1"
length() or str() to
retrieve the size of some objectsNote from above that we have only one entry per strain, but how many
strains are there in total? Recall from Lecture 01 we
used the length() function which does just as it implies by
returning the length of a vector, list, or factor. You can also use it
to set the length of those objects but it’s not something we have reason
to do.
On the other hand str() always gives us the same kind of
information plus a little more. Later on, we’ll see that more isn’t
always better and that using length() has its
advantages.
# Two ways to see how many unique entries we have
# ?length
length(unique(infection_meta.tbl$Worm_strain))
## [1] 21
# or
str(unique(infection_meta.tbl$Worm_strain))
## chr [1:21] "VC20019" "N2" "AB1" "JU397" "JU642" "MY6" "ED3042" "JU360" ...
Using unique() we are returned a character vector
containing 21 C. elegans strains. As you can see a funciton
like length() returns a simple vector value which can
become very helpful from a programmatic standpoint. The
str() function, on the other hand returns much more
human-readable information but is not readily useable as input for other
functions.
glimpse() and View() show us our
dataSuppose we want to see more of our data frame. There are a couple of
choices that can be used inside of RStudio. In this
IDE, you have access to your Environment pane which can
give you a quick idea of values for variables in your environment,
including a bit of what your tibble or
data.frame looks like.
Clicking on a data object like infection_meta.tbl will
generate a new tab that shows your entire tibble in a
human-readable format similar to an Excel spreadsheet. The same result
can be accomplished by using the view command
View(infection_meta.tbl).
The glimpse() command comes from the dplyr
package and brings up a comprehensive summary of your object that looks
very similar to the information provided in the Environment pane. You’ll
find it looks very much like the str() command but is
formatted in a more human-readable way. It tries to provide as much
information as possible in a small amount of space.
We can use this command in a code cell so let’s take a glimpse at
glimpse().
View(infection_meta.tbl)
# Only works in RStudio
# Let's compare str() to glimpse()
str(infection_meta.tbl)
## spc_tbl_ [276 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
## $ experimenter : chr [1:276] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:276] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## $ Worm_strain : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
## $ Total Worms : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:276] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
## $ Total ul spore : num [1:276] 0 56.8 113.6 0 56.8 ...
## $ Infection Round : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:276] 0 0.354 0.708 0 0.354 ...
## $ Time plated : num [1:276] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time Incubated : num [1:276] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:276] "72" "72" "72" "72" ...
## $ infection.type : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:276] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:276] 190513 190513 190513 190430 190513 ...
## $ Stain type : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
## $ Slide date : num [1:276] 190515 190515 190515 190501 190515 ...
## $ Slide number : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
## $ Slide Box : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:276] 190516 190516 190516 190502 190516 ...
## - attr(*, "spec")=
## .. cols(
## .. experiment = [31mcol_character()[39m,
## .. experimenter = [31mcol_character()[39m,
## .. description = [31mcol_character()[39m,
## .. `Infection Date` = [32mcol_double()[39m,
## .. `Plate Number` = [32mcol_double()[39m,
## .. Worm_strain = [31mcol_character()[39m,
## .. `Total Worms` = [32mcol_double()[39m,
## .. `Spore Strain` = [31mcol_character()[39m,
## .. `Spore Lot` = [31mcol_character()[39m,
## .. `Lot concentration` = [32mcol_double()[39m,
## .. `Total Spores (M)` = [32mcol_double()[39m,
## .. `Total ul spore` = [32mcol_double()[39m,
## .. `Infection Round` = [32mcol_double()[39m,
## .. `40X OP50 (mL)` = [32mcol_double()[39m,
## .. `Plate Size` = [32mcol_double()[39m,
## .. `Spores(M)/cm2` = [32mcol_double()[39m,
## .. `Time plated` = [32mcol_double()[39m,
## .. `Time Incubated` = [32mcol_double()[39m,
## .. Temp = [32mcol_double()[39m,
## .. timepoint = [31mcol_character()[39m,
## .. infection.type = [31mcol_character()[39m,
## .. `Fixing Date` = [32mcol_double()[39m,
## .. Location = [31mcol_character()[39m,
## .. `Staining Date` = [32mcol_double()[39m,
## .. `Stain type` = [31mcol_character()[39m,
## .. `Slide date` = [32mcol_double()[39m,
## .. `Slide number` = [32mcol_double()[39m,
## .. `Slide Box` = [32mcol_double()[39m,
## .. `Imaging Date` = [32mcol_double()[39m
## .. )
## - attr(*, "problems")=<externalptr>
# glimpse gives us less information overall but is also less redundant
glimpse(infection_meta.tbl)
## Rows: 276
## Columns: 29
## $ experiment [3m[90m<chr>[39m[23m "190426_VC20019_LUAm1_0M_72hpi", "190426_VC20019_L~
## $ experimenter [3m[90m<chr>[39m[23m "CM", "CM", "CM", "CM", "CM", "CM", "CM", "CM", "C~
## $ description [3m[90m<chr>[39m[23m "Wild isolate phenoMIP retest", "Wild isolate phen~
## $ `Infection Date` [3m[90m<dbl>[39m[23m 190423, 190423, 190423, 190423, 190423, 190423, 19~
## $ `Plate Number` [3m[90m<dbl>[39m[23m 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,~
## $ Worm_strain [3m[90m<chr>[39m[23m "VC20019", "VC20019", "VC20019", "N2", "N2", "N2",~
## $ `Total Worms` [3m[90m<dbl>[39m[23m 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10~
## $ `Spore Strain` [3m[90m<chr>[39m[23m "LUAm1", "LUAm1", "LUAm1", "LUAm1", "LUAm1", "LUAm~
## $ `Spore Lot` [3m[90m<chr>[39m[23m "2A", "2A", "2A", "2A", "2A", "2A", "2A", "2A", "2~
## $ `Lot concentration` [3m[90m<dbl>[39m[23m 176000, 176000, 176000, 176000, 176000, 176000, 17~
## $ `Total Spores (M)` [3m[90m<dbl>[39m[23m 0, 10, 20, 0, 10, 20, 0, 10, 20, 0, 10, 20, 0, 10,~
## $ `Total ul spore` [3m[90m<dbl>[39m[23m 0.00000, 56.81818, 113.63636, 0.00000, 56.81818, 1~
## $ `Infection Round` [3m[90m<dbl>[39m[23m 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ `40X OP50 (mL)` [3m[90m<dbl>[39m[23m 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.~
## $ `Plate Size` [3m[90m<dbl>[39m[23m 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
## $ `Spores(M)/cm2` [3m[90m<dbl>[39m[23m 0.0000000, 0.3538570, 0.7077141, 0.0000000, 0.3538~
## $ `Time plated` [3m[90m<dbl>[39m[23m 1300, 1300, 1300, 1300, 1300, 1300, 1300, 1300, 13~
## $ `Time Incubated` [3m[90m<dbl>[39m[23m 1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600, 16~
## $ Temp [3m[90m<dbl>[39m[23m 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21~
## $ timepoint [3m[90m<chr>[39m[23m "72", "72", "72", "72", "72", "72", "72", "72", "7~
## $ infection.type [3m[90m<chr>[39m[23m "continuous", "continuous", "continuous", "continu~
## $ `Fixing Date` [3m[90m<dbl>[39m[23m 190426, 190426, 190426, 190426, 190426, 190426, 19~
## $ Location [3m[90m<chr>[39m[23m "Sample exhausted", "Sample exhausted", "Sample ex~
## $ `Staining Date` [3m[90m<dbl>[39m[23m 190513, 190513, 190513, 190430, 190513, 190513, 19~
## $ `Stain type` [3m[90m<chr>[39m[23m "Sp.9 FISH + DY96", "Sp.9 FISH + DY96", "Sp.9 FISH~
## $ `Slide date` [3m[90m<dbl>[39m[23m 190515, 190515, 190515, 190501, 190515, 190515, 19~
## $ `Slide number` [3m[90m<dbl>[39m[23m 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,~
## $ `Slide Box` [3m[90m<dbl>[39m[23m 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,~
## $ `Imaging Date` [3m[90m<dbl>[39m[23m 190516, 190516, 190516, 190502, 190516, 190516, 19~
So the information provided by glimpse() is more sparse,
the formatting is a little tighter and we don’t have to see the extra
column attribute information as with str(), which can save
a lot of vertical space. On the other hand, the command takes longer to
type but that’s a personal choice.
How does dplyr handle column names with spaces? Look at the output from glimpse() above versus our use of str(). The use of glimpse() gives us another peek under the hood by showing us the the true names of the columns. Recall we emphasized that whitespace helps the R interpreter to recognize certain code switches.
In order to access column names through methods like the $ indexing method, we can’t normally accept spaces in names. To get around these limitations, the tibble actually uses the grave accent (`) diacritical (AKA a back-tick) on both sides of the column name (when necessary). This key is located just to the left of the “1” key along with the “~” symbol.
So if we wanted to access a column like “Fixing Date” we would actually need to use $`Fixing Date` instead! The same idea will apply later when we start working with functions from the dplyr package.
NA and NaN valuesWhat happens when you import data with missing values? These could be
empty entries in a CSV file or blank cells in a
xlsx file. Perhaps, as we’ll see later it could be a
specifically annotated entry like “No_Data”. These are usually the
result of missing data points from an experiment but could have origins
in other reasons like low-threshold values depending on the source of
your data.
Missing values in R are handled as NA or (Not
Available). Impossible values (like the results of dividing by zero) are
represented by NaN (Not a Number). These types of values
can be considered null values. These two types of values,
especially NAs, have special ways to be dealt with
otherwise it may lead to errors in functions that we frequently use.
Let us begin by building an example containing NA
values.
# Set up some vectors for a data.frame
modern_domain <- c("Archaea", "Bacteria", "Eukarya", NA, NA)
five_domains <- c("Archaea", "Bacteria", "Eukarya", "Virusobiota", "Prionobiota")
six_kingdoms <- c(1, 1, 4, NA, NA)
# Put it all together with in a call to data.frame()
NA_example <- data.frame(five_domains, modern_domain, six_kingdoms)
# Look at our data frame
NA_example
## five_domains modern_domain six_kingdoms
## 1 Archaea Archaea 1
## 2 Bacteria Bacteria 1
## 3 Eukarya Eukarya 4
## 4 Virusobiota <NA> NA
## 5 Prionobiota <NA> NA
NA
valuesR will not abide an NA value when completing a
calculation. If it does encounter an NA then it will return
an NA. Some mathematical functions, however, can ignore
NA values by explicitly setting the logical parameter
na.rm = TRUE. Under the hood, if the function recognizes
this parameter, it will remove the NA values before
proceeding to perform its mathematical operation.
IF you fail to set this parameter correctly, then
the function may return an NA value.
# Use the mean() function and see what happens with NA values
sum(six_kingdoms) # some functions need to be explicitly told what to do with NAs. No errors though!
## [1] NA
sum(six_kingdoms, na.rm = TRUE) #Avoid using just "T" as an abbreviation for "TRUE"
## [1] 6
apply() on data with NAs?Let’s recreate the counts data from Lecture 01 and add a
few NAs. If I now use the apply() function to
calculate the mean number of counts across each row (ie
genes), I will get NA as an answer for the rows that had
NAs.
counts <- data.frame(Site1 = c(geneA = 2, geneB = 4, geneC = 12, geneD = 8),
Site2 = c(geneA = 15, geneB = NA, geneC = 27, geneD = 28),
Site3 = c(geneA = 10, geneB = 7, geneC = 13, geneD = NA))
counts
## Site1 Site2 Site3
## geneA 2 15 10
## geneB 4 NA 7
## geneC 12 27 13
## geneD 8 28 NA
# Notice that we can only pass the function name "mean" and not any parameters
apply(X = counts, MARGIN = 1, FUN = mean)
## geneA geneB geneC geneD
## 9.00000 NA 17.33333 NA
Recall: we can pass additional parameters to apply() that are meant as parameters for our function FUN. So all we have to do is update the code appropriately to include the ‘na.rm=TRUE’ parameter.
# Pass parameters in our call
apply(X = counts, MARGIN = 1,
FUN = mean, na.rm = TRUE)
## geneA geneB geneC geneD
## 9.00000 5.50000 17.33333 18.00000
# Equivalent code - perhaps clearer but more verbose
apply(X = counts, MARGIN = 1,
FUN = function(x) mean(x, na.rm = TRUE))
## geneA geneB geneC geneD
## 9.00000 5.50000 17.33333 18.00000
is.na() function to check your dataHow do we find out ahead of time that we are missing data? Knowing is
half the battle and is.na() can help us determine this with
some data structures. The is.na() function can search
through data structures and return a logical data
structure of the same dimensions.
With a vector we can easily see how some basic functions work.
# Let's check out this vector that contains NA values
na_vector <- c(5, 6, NA, 7, 7, NA)
# This works on vectors...
is.na(na_vector)
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
# and data.frames too!
is.na(counts)
## Site1 Site2 Site3
## geneA FALSE FALSE FALSE
## geneB FALSE TRUE FALSE
## geneC FALSE FALSE FALSE
## geneD FALSE FALSE TRUE
# Let's look at our infection metadata for na values
is.na(infection_meta.tbl)
## experiment experimenter description Infection Date Plate Number
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE FALSE FALSE
## [32,] FALSE FALSE FALSE FALSE FALSE
## [33,] FALSE FALSE FALSE FALSE FALSE
## [34,] FALSE FALSE FALSE FALSE FALSE
## [35,] FALSE FALSE FALSE FALSE FALSE
## [36,] FALSE FALSE FALSE FALSE FALSE
## [37,] FALSE FALSE FALSE FALSE FALSE
## [38,] FALSE FALSE FALSE FALSE FALSE
## [39,] FALSE FALSE FALSE FALSE FALSE
## [40,] FALSE FALSE FALSE FALSE FALSE
## [41,] FALSE FALSE FALSE FALSE FALSE
## [42,] FALSE FALSE FALSE FALSE FALSE
## [43,] FALSE FALSE FALSE FALSE FALSE
## [44,] FALSE FALSE FALSE FALSE FALSE
## [45,] FALSE FALSE FALSE FALSE FALSE
## [46,] FALSE FALSE FALSE FALSE FALSE
## [47,] FALSE FALSE FALSE FALSE FALSE
## [48,] FALSE FALSE FALSE FALSE FALSE
## [49,] FALSE FALSE FALSE FALSE FALSE
## [50,] FALSE FALSE FALSE FALSE FALSE
## [51,] FALSE FALSE FALSE FALSE FALSE
## [52,] FALSE FALSE FALSE FALSE FALSE
## [53,] FALSE FALSE FALSE FALSE FALSE
## [54,] FALSE FALSE FALSE FALSE FALSE
## [55,] FALSE FALSE FALSE FALSE FALSE
## [56,] FALSE FALSE FALSE FALSE FALSE
## [57,] FALSE FALSE FALSE FALSE FALSE
## [58,] FALSE FALSE FALSE FALSE FALSE
## [59,] FALSE FALSE FALSE FALSE FALSE
## [60,] FALSE FALSE FALSE FALSE FALSE
## [61,] FALSE FALSE FALSE FALSE FALSE
## [62,] FALSE FALSE FALSE FALSE FALSE
## [63,] FALSE FALSE FALSE FALSE FALSE
## [64,] FALSE FALSE FALSE FALSE FALSE
## [65,] FALSE FALSE FALSE FALSE FALSE
## [66,] FALSE FALSE FALSE FALSE FALSE
## [67,] FALSE FALSE FALSE FALSE FALSE
## [68,] FALSE FALSE FALSE FALSE FALSE
## [69,] FALSE FALSE FALSE FALSE FALSE
## [70,] FALSE FALSE FALSE FALSE FALSE
## [71,] FALSE FALSE FALSE FALSE FALSE
## [72,] FALSE FALSE FALSE FALSE FALSE
## [73,] FALSE FALSE FALSE FALSE FALSE
## [74,] FALSE FALSE FALSE FALSE FALSE
## [75,] FALSE FALSE FALSE FALSE FALSE
## [76,] FALSE FALSE FALSE FALSE FALSE
## [77,] FALSE FALSE FALSE FALSE FALSE
## [78,] FALSE FALSE FALSE FALSE FALSE
## [79,] FALSE FALSE FALSE FALSE FALSE
## [80,] FALSE FALSE FALSE FALSE FALSE
## [81,] FALSE FALSE FALSE FALSE FALSE
## [82,] FALSE FALSE FALSE FALSE FALSE
## [83,] FALSE FALSE FALSE FALSE FALSE
## [84,] FALSE FALSE FALSE FALSE FALSE
## [85,] FALSE FALSE FALSE FALSE FALSE
## [86,] FALSE FALSE FALSE FALSE FALSE
## [87,] FALSE FALSE FALSE FALSE FALSE
## [88,] FALSE FALSE FALSE FALSE FALSE
## [89,] FALSE FALSE FALSE FALSE FALSE
## [90,] FALSE FALSE FALSE FALSE FALSE
## [91,] FALSE FALSE FALSE FALSE FALSE
## [92,] FALSE FALSE FALSE FALSE FALSE
## [93,] FALSE FALSE FALSE FALSE FALSE
## [94,] FALSE FALSE FALSE FALSE FALSE
## [95,] FALSE FALSE FALSE FALSE FALSE
## [96,] FALSE FALSE FALSE FALSE FALSE
## [97,] FALSE FALSE FALSE FALSE FALSE
## [98,] FALSE FALSE FALSE FALSE FALSE
## [99,] FALSE FALSE FALSE FALSE FALSE
## [100,] FALSE FALSE FALSE FALSE FALSE
## [101,] FALSE FALSE FALSE FALSE FALSE
## [102,] FALSE FALSE FALSE FALSE FALSE
## [103,] FALSE FALSE FALSE FALSE FALSE
## [104,] FALSE FALSE FALSE FALSE FALSE
## [105,] FALSE FALSE FALSE FALSE FALSE
## [106,] FALSE FALSE FALSE FALSE FALSE
## [107,] FALSE FALSE FALSE FALSE FALSE
## [108,] FALSE FALSE FALSE FALSE FALSE
## [109,] FALSE FALSE FALSE FALSE FALSE
## [110,] FALSE FALSE FALSE FALSE FALSE
## [111,] FALSE FALSE FALSE FALSE FALSE
## [112,] FALSE FALSE FALSE FALSE FALSE
## [113,] FALSE FALSE FALSE FALSE FALSE
## [114,] FALSE FALSE FALSE FALSE FALSE
## [115,] FALSE FALSE FALSE FALSE FALSE
## [116,] FALSE FALSE FALSE FALSE FALSE
## [117,] FALSE FALSE FALSE FALSE FALSE
## [118,] FALSE FALSE FALSE FALSE FALSE
## [119,] FALSE FALSE FALSE FALSE FALSE
## [120,] FALSE FALSE FALSE FALSE FALSE
## [121,] FALSE FALSE FALSE FALSE FALSE
## [122,] FALSE FALSE FALSE FALSE FALSE
## [123,] FALSE FALSE FALSE FALSE FALSE
## [124,] FALSE FALSE FALSE FALSE FALSE
## [125,] FALSE FALSE FALSE FALSE FALSE
## [126,] FALSE FALSE FALSE FALSE FALSE
## [127,] FALSE FALSE FALSE FALSE FALSE
## [128,] FALSE FALSE FALSE FALSE FALSE
## [129,] FALSE FALSE FALSE FALSE FALSE
## [130,] FALSE FALSE FALSE FALSE FALSE
## [131,] FALSE FALSE FALSE FALSE FALSE
## [132,] FALSE FALSE FALSE FALSE FALSE
## [133,] FALSE FALSE FALSE FALSE FALSE
## [134,] FALSE FALSE FALSE FALSE FALSE
## [135,] FALSE FALSE FALSE FALSE FALSE
## [136,] FALSE FALSE FALSE FALSE FALSE
## [137,] FALSE FALSE FALSE FALSE FALSE
## [138,] FALSE FALSE FALSE FALSE FALSE
## [139,] FALSE FALSE FALSE FALSE FALSE
## [140,] FALSE FALSE FALSE FALSE FALSE
## [141,] FALSE FALSE FALSE FALSE FALSE
## [142,] FALSE FALSE FALSE FALSE FALSE
## [143,] FALSE FALSE FALSE FALSE FALSE
## [144,] FALSE FALSE FALSE FALSE FALSE
## [145,] FALSE FALSE FALSE FALSE FALSE
## [146,] FALSE FALSE FALSE FALSE FALSE
## [147,] FALSE FALSE FALSE FALSE FALSE
## [148,] FALSE FALSE FALSE FALSE FALSE
## [149,] FALSE FALSE FALSE FALSE FALSE
## [150,] FALSE FALSE FALSE FALSE FALSE
## [151,] FALSE FALSE FALSE FALSE FALSE
## [152,] FALSE FALSE FALSE FALSE FALSE
## [153,] FALSE FALSE FALSE FALSE FALSE
## [154,] FALSE FALSE FALSE FALSE FALSE
## [155,] FALSE FALSE FALSE FALSE FALSE
## [156,] FALSE FALSE FALSE FALSE FALSE
## [157,] FALSE FALSE FALSE FALSE FALSE
## [158,] FALSE FALSE FALSE FALSE FALSE
## [159,] FALSE FALSE FALSE FALSE FALSE
## [160,] FALSE FALSE FALSE FALSE FALSE
## [161,] FALSE FALSE FALSE FALSE FALSE
## [162,] FALSE FALSE FALSE FALSE FALSE
## [163,] FALSE FALSE FALSE FALSE FALSE
## [164,] FALSE FALSE FALSE FALSE FALSE
## [165,] FALSE FALSE FALSE FALSE FALSE
## [166,] FALSE FALSE FALSE FALSE FALSE
## [167,] FALSE FALSE FALSE FALSE FALSE
## [168,] FALSE FALSE FALSE FALSE FALSE
## [169,] FALSE FALSE FALSE FALSE FALSE
## [170,] FALSE FALSE FALSE FALSE FALSE
## [171,] FALSE FALSE FALSE FALSE FALSE
## [172,] FALSE FALSE FALSE FALSE FALSE
## [173,] FALSE FALSE FALSE FALSE FALSE
## [174,] FALSE FALSE FALSE FALSE FALSE
## [175,] FALSE FALSE FALSE FALSE FALSE
## [176,] FALSE FALSE FALSE FALSE FALSE
## [177,] FALSE FALSE FALSE FALSE FALSE
## [178,] FALSE FALSE FALSE FALSE FALSE
## [179,] FALSE FALSE FALSE FALSE FALSE
## [180,] FALSE FALSE FALSE FALSE FALSE
## [181,] FALSE FALSE FALSE FALSE FALSE
## [182,] FALSE FALSE FALSE FALSE FALSE
## [183,] FALSE FALSE FALSE FALSE FALSE
## [184,] FALSE FALSE FALSE FALSE FALSE
## [185,] FALSE FALSE FALSE FALSE FALSE
## [186,] FALSE FALSE FALSE FALSE FALSE
## [187,] FALSE FALSE FALSE FALSE FALSE
## [188,] FALSE FALSE FALSE FALSE FALSE
## [189,] FALSE FALSE FALSE FALSE FALSE
## [190,] FALSE FALSE FALSE FALSE FALSE
## [191,] FALSE FALSE FALSE FALSE FALSE
## [192,] FALSE FALSE FALSE FALSE FALSE
## [193,] FALSE FALSE FALSE FALSE FALSE
## [194,] FALSE FALSE FALSE FALSE FALSE
## [195,] FALSE FALSE FALSE FALSE FALSE
## [196,] FALSE FALSE FALSE FALSE FALSE
## [197,] FALSE FALSE FALSE FALSE FALSE
## [198,] FALSE FALSE FALSE FALSE FALSE
## [199,] FALSE FALSE FALSE FALSE FALSE
## [200,] FALSE FALSE FALSE FALSE FALSE
## [201,] FALSE FALSE FALSE FALSE FALSE
## [202,] FALSE FALSE FALSE FALSE FALSE
## [203,] FALSE FALSE FALSE FALSE FALSE
## [204,] FALSE FALSE FALSE FALSE FALSE
## [205,] FALSE FALSE FALSE FALSE FALSE
## [206,] FALSE FALSE FALSE FALSE FALSE
## [207,] FALSE FALSE FALSE FALSE FALSE
## [208,] FALSE FALSE FALSE FALSE FALSE
## [209,] FALSE FALSE FALSE FALSE FALSE
## [210,] FALSE FALSE FALSE FALSE FALSE
## [211,] FALSE FALSE FALSE FALSE FALSE
## [212,] FALSE FALSE FALSE FALSE FALSE
## [213,] FALSE FALSE FALSE FALSE FALSE
## [214,] FALSE FALSE FALSE FALSE FALSE
## [215,] FALSE FALSE FALSE FALSE FALSE
## [216,] FALSE FALSE FALSE FALSE FALSE
## [217,] FALSE FALSE FALSE FALSE FALSE
## [218,] FALSE FALSE FALSE FALSE FALSE
## [219,] FALSE FALSE FALSE FALSE FALSE
## [220,] FALSE FALSE FALSE FALSE FALSE
## [221,] FALSE FALSE FALSE FALSE FALSE
## [222,] FALSE FALSE FALSE FALSE FALSE
## [223,] FALSE FALSE FALSE FALSE FALSE
## [224,] FALSE FALSE FALSE FALSE FALSE
## [225,] FALSE FALSE FALSE FALSE FALSE
## [226,] FALSE FALSE FALSE FALSE FALSE
## [227,] FALSE FALSE FALSE FALSE FALSE
## [228,] FALSE FALSE FALSE FALSE FALSE
## [229,] FALSE FALSE FALSE FALSE FALSE
## [230,] FALSE FALSE FALSE FALSE FALSE
## [231,] FALSE FALSE FALSE FALSE FALSE
## [232,] FALSE FALSE FALSE FALSE FALSE
## [233,] FALSE FALSE FALSE FALSE FALSE
## [234,] FALSE FALSE FALSE FALSE FALSE
## [235,] FALSE FALSE FALSE FALSE FALSE
## [236,] FALSE FALSE FALSE FALSE FALSE
## [237,] FALSE FALSE FALSE FALSE FALSE
## [238,] FALSE FALSE FALSE FALSE FALSE
## [239,] FALSE FALSE FALSE FALSE FALSE
## [240,] FALSE FALSE FALSE FALSE FALSE
## [241,] FALSE FALSE FALSE FALSE FALSE
## [242,] FALSE FALSE FALSE FALSE FALSE
## [243,] FALSE FALSE FALSE FALSE FALSE
## [244,] FALSE FALSE FALSE FALSE FALSE
## [245,] FALSE FALSE FALSE FALSE FALSE
## [246,] FALSE FALSE FALSE FALSE FALSE
## [247,] FALSE FALSE FALSE FALSE FALSE
## [248,] FALSE FALSE FALSE FALSE FALSE
## [249,] FALSE FALSE FALSE FALSE FALSE
## [250,] FALSE FALSE FALSE FALSE FALSE
## [251,] FALSE FALSE FALSE FALSE FALSE
## [252,] FALSE FALSE FALSE FALSE FALSE
## [253,] FALSE FALSE FALSE FALSE FALSE
## [254,] FALSE FALSE FALSE FALSE FALSE
## [255,] FALSE FALSE FALSE FALSE FALSE
## [256,] FALSE FALSE FALSE FALSE FALSE
## [257,] FALSE FALSE FALSE FALSE FALSE
## [258,] FALSE FALSE FALSE FALSE FALSE
## [259,] FALSE FALSE FALSE FALSE FALSE
## [260,] FALSE FALSE FALSE FALSE FALSE
## [261,] FALSE FALSE FALSE FALSE FALSE
## [262,] FALSE FALSE FALSE FALSE FALSE
## [263,] FALSE FALSE FALSE FALSE FALSE
## [264,] FALSE FALSE FALSE FALSE FALSE
## [265,] FALSE FALSE FALSE FALSE FALSE
## [266,] FALSE FALSE FALSE FALSE FALSE
## [267,] FALSE FALSE FALSE FALSE FALSE
## [268,] FALSE FALSE FALSE FALSE FALSE
## [269,] FALSE FALSE FALSE FALSE FALSE
## [270,] FALSE FALSE FALSE FALSE FALSE
## [271,] FALSE FALSE FALSE FALSE FALSE
## [272,] FALSE FALSE FALSE FALSE FALSE
## [273,] FALSE FALSE FALSE FALSE FALSE
## [274,] FALSE FALSE FALSE FALSE FALSE
## [275,] FALSE FALSE FALSE FALSE FALSE
## [276,] FALSE FALSE FALSE FALSE FALSE
## Worm_strain Total Worms Spore Strain Spore Lot Lot concentration
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE FALSE FALSE
## [32,] FALSE FALSE FALSE FALSE FALSE
## [33,] FALSE FALSE FALSE FALSE FALSE
## [34,] FALSE FALSE FALSE FALSE FALSE
## [35,] FALSE FALSE FALSE FALSE FALSE
## [36,] FALSE FALSE FALSE FALSE FALSE
## [37,] FALSE FALSE FALSE FALSE FALSE
## [38,] FALSE FALSE FALSE FALSE FALSE
## [39,] FALSE FALSE FALSE FALSE FALSE
## [40,] FALSE FALSE FALSE FALSE FALSE
## [41,] FALSE FALSE FALSE FALSE FALSE
## [42,] FALSE FALSE FALSE FALSE FALSE
## [43,] FALSE FALSE FALSE FALSE FALSE
## [44,] FALSE FALSE FALSE FALSE FALSE
## [45,] FALSE FALSE FALSE FALSE FALSE
## [46,] FALSE FALSE FALSE FALSE FALSE
## [47,] FALSE FALSE FALSE FALSE FALSE
## [48,] FALSE FALSE FALSE FALSE FALSE
## [49,] FALSE FALSE FALSE FALSE FALSE
## [50,] FALSE FALSE FALSE FALSE FALSE
## [51,] FALSE FALSE FALSE FALSE FALSE
## [52,] FALSE FALSE FALSE FALSE FALSE
## [53,] FALSE FALSE FALSE FALSE FALSE
## [54,] FALSE FALSE FALSE FALSE FALSE
## [55,] FALSE FALSE FALSE FALSE FALSE
## [56,] FALSE FALSE FALSE FALSE FALSE
## [57,] FALSE FALSE FALSE FALSE FALSE
## [58,] FALSE FALSE FALSE FALSE FALSE
## [59,] FALSE FALSE FALSE FALSE FALSE
## [60,] FALSE FALSE FALSE FALSE FALSE
## [61,] FALSE FALSE FALSE FALSE FALSE
## [62,] FALSE FALSE FALSE FALSE FALSE
## [63,] FALSE FALSE FALSE FALSE FALSE
## [64,] FALSE FALSE FALSE FALSE FALSE
## [65,] FALSE FALSE FALSE FALSE FALSE
## [66,] FALSE FALSE FALSE FALSE FALSE
## [67,] FALSE FALSE FALSE FALSE FALSE
## [68,] FALSE FALSE FALSE FALSE FALSE
## [69,] FALSE FALSE FALSE FALSE FALSE
## [70,] FALSE FALSE FALSE FALSE FALSE
## [71,] FALSE FALSE FALSE FALSE FALSE
## [72,] FALSE FALSE FALSE FALSE FALSE
## [73,] FALSE FALSE FALSE FALSE FALSE
## [74,] FALSE FALSE FALSE FALSE FALSE
## [75,] FALSE FALSE FALSE FALSE FALSE
## [76,] FALSE FALSE FALSE FALSE FALSE
## [77,] FALSE FALSE FALSE FALSE FALSE
## [78,] FALSE FALSE FALSE FALSE FALSE
## [79,] FALSE FALSE FALSE FALSE FALSE
## [80,] FALSE FALSE FALSE FALSE FALSE
## [81,] FALSE FALSE FALSE FALSE FALSE
## [82,] FALSE FALSE FALSE FALSE FALSE
## [83,] FALSE FALSE FALSE FALSE FALSE
## [84,] FALSE FALSE FALSE FALSE FALSE
## [85,] FALSE FALSE FALSE FALSE FALSE
## [86,] FALSE FALSE FALSE FALSE FALSE
## [87,] FALSE FALSE FALSE FALSE FALSE
## [88,] FALSE FALSE FALSE FALSE FALSE
## [89,] FALSE FALSE FALSE FALSE FALSE
## [90,] FALSE FALSE FALSE FALSE FALSE
## [91,] FALSE FALSE FALSE FALSE FALSE
## [92,] FALSE FALSE FALSE FALSE FALSE
## [93,] FALSE FALSE FALSE FALSE FALSE
## [94,] FALSE FALSE FALSE FALSE FALSE
## [95,] FALSE FALSE FALSE FALSE FALSE
## [96,] FALSE FALSE FALSE FALSE FALSE
## [97,] FALSE FALSE FALSE FALSE FALSE
## [98,] FALSE FALSE FALSE FALSE FALSE
## [99,] FALSE FALSE FALSE FALSE FALSE
## [100,] FALSE FALSE FALSE FALSE FALSE
## [101,] FALSE FALSE FALSE FALSE FALSE
## [102,] FALSE FALSE FALSE FALSE FALSE
## [103,] FALSE FALSE FALSE FALSE FALSE
## [104,] FALSE FALSE FALSE FALSE FALSE
## [105,] FALSE FALSE FALSE FALSE FALSE
## [106,] FALSE FALSE FALSE FALSE FALSE
## [107,] FALSE FALSE FALSE FALSE FALSE
## [108,] FALSE FALSE FALSE FALSE FALSE
## [109,] FALSE FALSE FALSE FALSE FALSE
## [110,] FALSE FALSE FALSE FALSE FALSE
## [111,] FALSE FALSE FALSE FALSE FALSE
## [112,] FALSE FALSE FALSE FALSE FALSE
## [113,] FALSE FALSE FALSE FALSE FALSE
## [114,] FALSE FALSE FALSE FALSE FALSE
## [115,] FALSE FALSE FALSE FALSE FALSE
## [116,] FALSE FALSE FALSE FALSE FALSE
## [117,] FALSE FALSE FALSE FALSE FALSE
## [118,] FALSE FALSE FALSE FALSE FALSE
## [119,] FALSE FALSE FALSE FALSE FALSE
## [120,] FALSE FALSE FALSE FALSE FALSE
## [121,] FALSE FALSE FALSE FALSE FALSE
## [122,] FALSE FALSE FALSE FALSE FALSE
## [123,] FALSE FALSE FALSE FALSE FALSE
## [124,] FALSE FALSE FALSE FALSE FALSE
## [125,] FALSE FALSE FALSE FALSE FALSE
## [126,] FALSE FALSE FALSE FALSE FALSE
## [127,] FALSE FALSE FALSE FALSE FALSE
## [128,] FALSE FALSE FALSE FALSE FALSE
## [129,] FALSE FALSE FALSE FALSE FALSE
## [130,] FALSE FALSE FALSE FALSE FALSE
## [131,] FALSE FALSE FALSE FALSE FALSE
## [132,] FALSE FALSE FALSE FALSE FALSE
## [133,] FALSE FALSE FALSE FALSE FALSE
## [134,] FALSE FALSE FALSE FALSE FALSE
## [135,] FALSE FALSE FALSE FALSE FALSE
## [136,] FALSE FALSE FALSE FALSE FALSE
## [137,] FALSE FALSE FALSE FALSE FALSE
## [138,] FALSE FALSE FALSE FALSE FALSE
## [139,] FALSE FALSE FALSE FALSE FALSE
## [140,] FALSE FALSE FALSE FALSE FALSE
## [141,] FALSE FALSE FALSE FALSE FALSE
## [142,] FALSE FALSE FALSE FALSE FALSE
## [143,] FALSE FALSE FALSE FALSE FALSE
## [144,] FALSE FALSE FALSE FALSE FALSE
## [145,] FALSE FALSE FALSE FALSE FALSE
## [146,] FALSE FALSE FALSE FALSE FALSE
## [147,] FALSE FALSE FALSE FALSE FALSE
## [148,] FALSE FALSE FALSE FALSE FALSE
## [149,] FALSE FALSE FALSE FALSE FALSE
## [150,] FALSE FALSE FALSE FALSE FALSE
## [151,] FALSE FALSE FALSE FALSE FALSE
## [152,] FALSE FALSE FALSE FALSE FALSE
## [153,] FALSE FALSE FALSE FALSE FALSE
## [154,] FALSE FALSE FALSE FALSE FALSE
## [155,] FALSE FALSE FALSE FALSE FALSE
## [156,] FALSE FALSE FALSE FALSE FALSE
## [157,] FALSE FALSE FALSE FALSE FALSE
## [158,] FALSE FALSE FALSE FALSE FALSE
## [159,] FALSE FALSE FALSE FALSE FALSE
## [160,] FALSE FALSE FALSE FALSE FALSE
## [161,] FALSE FALSE FALSE FALSE FALSE
## [162,] FALSE FALSE FALSE FALSE FALSE
## [163,] FALSE FALSE FALSE FALSE FALSE
## [164,] FALSE FALSE FALSE FALSE FALSE
## [165,] FALSE FALSE FALSE FALSE FALSE
## [166,] FALSE FALSE FALSE FALSE FALSE
## [167,] FALSE FALSE FALSE FALSE FALSE
## [168,] FALSE FALSE FALSE FALSE FALSE
## [169,] FALSE FALSE FALSE FALSE FALSE
## [170,] FALSE FALSE FALSE FALSE FALSE
## [171,] FALSE FALSE FALSE FALSE FALSE
## [172,] FALSE FALSE FALSE FALSE FALSE
## [173,] FALSE FALSE FALSE FALSE FALSE
## [174,] FALSE FALSE FALSE FALSE FALSE
## [175,] FALSE FALSE FALSE FALSE FALSE
## [176,] FALSE FALSE FALSE FALSE FALSE
## [177,] FALSE FALSE FALSE FALSE FALSE
## [178,] FALSE FALSE FALSE FALSE FALSE
## [179,] FALSE FALSE FALSE FALSE FALSE
## [180,] FALSE FALSE FALSE FALSE FALSE
## [181,] FALSE FALSE FALSE FALSE FALSE
## [182,] FALSE FALSE FALSE FALSE FALSE
## [183,] FALSE FALSE FALSE FALSE FALSE
## [184,] FALSE FALSE FALSE FALSE FALSE
## [185,] FALSE FALSE FALSE FALSE FALSE
## [186,] FALSE FALSE FALSE FALSE FALSE
## [187,] FALSE FALSE FALSE FALSE FALSE
## [188,] FALSE FALSE FALSE FALSE FALSE
## [189,] FALSE FALSE FALSE FALSE FALSE
## [190,] FALSE FALSE FALSE FALSE FALSE
## [191,] FALSE FALSE FALSE FALSE FALSE
## [192,] FALSE FALSE FALSE FALSE FALSE
## [193,] FALSE FALSE FALSE FALSE FALSE
## [194,] FALSE FALSE FALSE FALSE FALSE
## [195,] FALSE FALSE FALSE FALSE FALSE
## [196,] FALSE FALSE FALSE FALSE FALSE
## [197,] FALSE FALSE FALSE FALSE FALSE
## [198,] FALSE FALSE FALSE FALSE FALSE
## [199,] FALSE FALSE FALSE FALSE FALSE
## [200,] FALSE FALSE FALSE FALSE FALSE
## [201,] FALSE FALSE FALSE FALSE FALSE
## [202,] FALSE FALSE FALSE FALSE FALSE
## [203,] FALSE FALSE FALSE FALSE FALSE
## [204,] FALSE FALSE FALSE FALSE FALSE
## [205,] FALSE FALSE FALSE FALSE FALSE
## [206,] FALSE FALSE FALSE FALSE FALSE
## [207,] FALSE FALSE FALSE FALSE FALSE
## [208,] FALSE FALSE FALSE FALSE FALSE
## [209,] FALSE FALSE FALSE FALSE FALSE
## [210,] FALSE FALSE FALSE FALSE FALSE
## [211,] FALSE FALSE FALSE FALSE FALSE
## [212,] FALSE FALSE FALSE FALSE FALSE
## [213,] FALSE FALSE FALSE FALSE FALSE
## [214,] FALSE FALSE FALSE FALSE FALSE
## [215,] FALSE FALSE FALSE FALSE FALSE
## [216,] FALSE FALSE FALSE FALSE FALSE
## [217,] FALSE FALSE FALSE FALSE FALSE
## [218,] FALSE FALSE FALSE FALSE FALSE
## [219,] FALSE FALSE FALSE FALSE FALSE
## [220,] FALSE FALSE FALSE FALSE FALSE
## [221,] FALSE FALSE FALSE FALSE FALSE
## [222,] FALSE FALSE FALSE FALSE FALSE
## [223,] FALSE FALSE FALSE FALSE FALSE
## [224,] FALSE FALSE FALSE FALSE FALSE
## [225,] FALSE FALSE FALSE FALSE FALSE
## [226,] FALSE FALSE FALSE FALSE FALSE
## [227,] FALSE FALSE FALSE FALSE FALSE
## [228,] FALSE FALSE FALSE FALSE FALSE
## [229,] FALSE FALSE FALSE FALSE FALSE
## [230,] FALSE FALSE FALSE FALSE FALSE
## [231,] FALSE FALSE FALSE FALSE FALSE
## [232,] FALSE FALSE FALSE FALSE FALSE
## [233,] FALSE FALSE FALSE FALSE FALSE
## [234,] FALSE FALSE FALSE FALSE FALSE
## [235,] FALSE FALSE FALSE FALSE FALSE
## [236,] FALSE FALSE FALSE FALSE FALSE
## [237,] FALSE FALSE FALSE FALSE FALSE
## [238,] FALSE FALSE FALSE FALSE FALSE
## [239,] FALSE FALSE FALSE FALSE FALSE
## [240,] FALSE FALSE FALSE FALSE FALSE
## [241,] FALSE FALSE FALSE FALSE FALSE
## [242,] FALSE FALSE FALSE FALSE FALSE
## [243,] FALSE FALSE FALSE FALSE FALSE
## [244,] FALSE FALSE FALSE FALSE FALSE
## [245,] FALSE FALSE FALSE FALSE FALSE
## [246,] FALSE FALSE FALSE FALSE FALSE
## [247,] FALSE FALSE FALSE FALSE FALSE
## [248,] FALSE FALSE FALSE FALSE FALSE
## [249,] FALSE FALSE FALSE FALSE FALSE
## [250,] FALSE FALSE FALSE FALSE FALSE
## [251,] FALSE FALSE FALSE FALSE FALSE
## [252,] FALSE FALSE FALSE FALSE FALSE
## [253,] FALSE FALSE FALSE FALSE FALSE
## [254,] FALSE FALSE FALSE FALSE FALSE
## [255,] FALSE FALSE FALSE FALSE FALSE
## [256,] FALSE FALSE FALSE FALSE FALSE
## [257,] FALSE FALSE FALSE FALSE FALSE
## [258,] FALSE FALSE FALSE FALSE FALSE
## [259,] FALSE FALSE FALSE FALSE FALSE
## [260,] FALSE FALSE FALSE FALSE FALSE
## [261,] FALSE FALSE FALSE FALSE FALSE
## [262,] FALSE FALSE FALSE FALSE FALSE
## [263,] FALSE FALSE FALSE FALSE FALSE
## [264,] FALSE FALSE FALSE FALSE FALSE
## [265,] FALSE FALSE FALSE FALSE FALSE
## [266,] FALSE FALSE FALSE FALSE FALSE
## [267,] FALSE FALSE FALSE FALSE FALSE
## [268,] FALSE FALSE FALSE FALSE FALSE
## [269,] FALSE FALSE FALSE FALSE FALSE
## [270,] FALSE FALSE FALSE FALSE FALSE
## [271,] FALSE FALSE FALSE FALSE FALSE
## [272,] FALSE FALSE FALSE FALSE FALSE
## [273,] FALSE FALSE FALSE FALSE FALSE
## [274,] FALSE FALSE FALSE FALSE FALSE
## [275,] FALSE FALSE FALSE FALSE FALSE
## [276,] FALSE FALSE FALSE FALSE FALSE
## Total Spores (M) Total ul spore Infection Round 40X OP50 (mL) Plate Size
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE FALSE FALSE
## [32,] FALSE FALSE FALSE FALSE FALSE
## [33,] FALSE FALSE FALSE FALSE FALSE
## [34,] FALSE FALSE FALSE FALSE FALSE
## [35,] FALSE FALSE FALSE FALSE FALSE
## [36,] FALSE FALSE FALSE FALSE FALSE
## [37,] FALSE FALSE FALSE FALSE FALSE
## [38,] FALSE FALSE FALSE FALSE FALSE
## [39,] FALSE FALSE FALSE FALSE FALSE
## [40,] FALSE FALSE FALSE FALSE FALSE
## [41,] FALSE FALSE FALSE FALSE FALSE
## [42,] FALSE FALSE FALSE FALSE FALSE
## [43,] FALSE FALSE FALSE FALSE FALSE
## [44,] FALSE FALSE FALSE FALSE FALSE
## [45,] FALSE FALSE FALSE FALSE FALSE
## [46,] FALSE FALSE FALSE FALSE FALSE
## [47,] FALSE FALSE FALSE FALSE FALSE
## [48,] FALSE FALSE FALSE FALSE FALSE
## [49,] FALSE FALSE FALSE FALSE FALSE
## [50,] FALSE FALSE FALSE FALSE FALSE
## [51,] FALSE FALSE FALSE FALSE FALSE
## [52,] FALSE FALSE FALSE FALSE FALSE
## [53,] FALSE FALSE FALSE FALSE FALSE
## [54,] FALSE FALSE FALSE FALSE FALSE
## [55,] FALSE FALSE FALSE FALSE FALSE
## [56,] FALSE FALSE FALSE FALSE FALSE
## [57,] FALSE FALSE FALSE FALSE FALSE
## [58,] FALSE FALSE FALSE FALSE FALSE
## [59,] FALSE FALSE FALSE FALSE FALSE
## [60,] FALSE FALSE FALSE FALSE FALSE
## [61,] FALSE FALSE FALSE FALSE FALSE
## [62,] FALSE FALSE FALSE FALSE FALSE
## [63,] FALSE FALSE FALSE FALSE FALSE
## [64,] FALSE FALSE FALSE FALSE FALSE
## [65,] FALSE FALSE FALSE FALSE FALSE
## [66,] FALSE FALSE FALSE FALSE FALSE
## [67,] FALSE FALSE FALSE FALSE FALSE
## [68,] FALSE FALSE FALSE FALSE FALSE
## [69,] FALSE FALSE FALSE FALSE FALSE
## [70,] FALSE FALSE FALSE FALSE FALSE
## [71,] FALSE FALSE FALSE FALSE FALSE
## [72,] FALSE FALSE FALSE FALSE FALSE
## [73,] FALSE FALSE FALSE FALSE FALSE
## [74,] FALSE FALSE FALSE FALSE FALSE
## [75,] FALSE FALSE FALSE FALSE FALSE
## [76,] FALSE FALSE FALSE FALSE FALSE
## [77,] FALSE FALSE FALSE FALSE FALSE
## [78,] FALSE FALSE FALSE FALSE FALSE
## [79,] FALSE FALSE FALSE FALSE FALSE
## [80,] FALSE FALSE FALSE FALSE FALSE
## [81,] FALSE FALSE FALSE FALSE FALSE
## [82,] FALSE FALSE FALSE FALSE FALSE
## [83,] FALSE FALSE FALSE FALSE FALSE
## [84,] FALSE FALSE FALSE FALSE FALSE
## [85,] FALSE FALSE FALSE FALSE FALSE
## [86,] FALSE FALSE FALSE FALSE FALSE
## [87,] FALSE FALSE FALSE FALSE FALSE
## [88,] FALSE FALSE FALSE FALSE FALSE
## [89,] FALSE FALSE FALSE FALSE FALSE
## [90,] FALSE FALSE FALSE FALSE FALSE
## [91,] FALSE FALSE FALSE FALSE FALSE
## [92,] FALSE FALSE FALSE FALSE FALSE
## [93,] FALSE FALSE FALSE FALSE FALSE
## [94,] FALSE FALSE FALSE FALSE FALSE
## [95,] FALSE FALSE FALSE FALSE FALSE
## [96,] FALSE FALSE FALSE FALSE FALSE
## [97,] FALSE FALSE FALSE FALSE FALSE
## [98,] FALSE FALSE FALSE FALSE FALSE
## [99,] FALSE FALSE FALSE FALSE FALSE
## [100,] FALSE FALSE FALSE FALSE FALSE
## [101,] FALSE FALSE FALSE FALSE FALSE
## [102,] FALSE FALSE FALSE FALSE FALSE
## [103,] FALSE FALSE FALSE FALSE FALSE
## [104,] FALSE FALSE FALSE FALSE FALSE
## [105,] FALSE FALSE FALSE FALSE FALSE
## [106,] FALSE FALSE FALSE FALSE FALSE
## [107,] FALSE FALSE FALSE FALSE FALSE
## [108,] FALSE FALSE FALSE FALSE FALSE
## [109,] FALSE FALSE FALSE FALSE FALSE
## [110,] FALSE FALSE FALSE FALSE FALSE
## [111,] FALSE FALSE FALSE FALSE FALSE
## [112,] FALSE FALSE FALSE FALSE FALSE
## [113,] FALSE FALSE FALSE FALSE FALSE
## [114,] FALSE FALSE FALSE FALSE FALSE
## [115,] FALSE FALSE FALSE FALSE FALSE
## [116,] FALSE FALSE FALSE FALSE FALSE
## [117,] FALSE FALSE FALSE FALSE FALSE
## [118,] FALSE FALSE FALSE FALSE FALSE
## [119,] FALSE FALSE FALSE FALSE FALSE
## [120,] FALSE FALSE FALSE FALSE FALSE
## [121,] FALSE FALSE FALSE FALSE FALSE
## [122,] FALSE FALSE FALSE FALSE FALSE
## [123,] FALSE FALSE FALSE FALSE FALSE
## [124,] FALSE FALSE FALSE FALSE FALSE
## [125,] FALSE FALSE FALSE FALSE FALSE
## [126,] FALSE FALSE FALSE FALSE FALSE
## [127,] FALSE FALSE FALSE FALSE FALSE
## [128,] FALSE FALSE FALSE FALSE FALSE
## [129,] FALSE FALSE FALSE FALSE FALSE
## [130,] FALSE FALSE FALSE FALSE FALSE
## [131,] FALSE FALSE FALSE FALSE FALSE
## [132,] FALSE FALSE FALSE FALSE FALSE
## [133,] FALSE FALSE FALSE FALSE FALSE
## [134,] FALSE FALSE FALSE FALSE FALSE
## [135,] FALSE FALSE FALSE FALSE FALSE
## [136,] FALSE FALSE FALSE FALSE FALSE
## [137,] FALSE FALSE FALSE FALSE FALSE
## [138,] FALSE FALSE FALSE FALSE FALSE
## [139,] FALSE FALSE FALSE FALSE FALSE
## [140,] FALSE FALSE FALSE FALSE FALSE
## [141,] FALSE FALSE FALSE FALSE FALSE
## [142,] FALSE FALSE FALSE FALSE FALSE
## [143,] FALSE FALSE FALSE FALSE FALSE
## [144,] FALSE FALSE FALSE FALSE FALSE
## [145,] FALSE FALSE FALSE FALSE FALSE
## [146,] FALSE FALSE FALSE FALSE FALSE
## [147,] FALSE FALSE FALSE FALSE FALSE
## [148,] FALSE FALSE FALSE FALSE FALSE
## [149,] FALSE FALSE FALSE FALSE FALSE
## [150,] FALSE FALSE FALSE FALSE FALSE
## [151,] FALSE FALSE FALSE FALSE FALSE
## [152,] FALSE FALSE FALSE FALSE FALSE
## [153,] FALSE FALSE FALSE FALSE FALSE
## [154,] FALSE FALSE FALSE FALSE FALSE
## [155,] FALSE FALSE FALSE FALSE FALSE
## [156,] FALSE FALSE FALSE FALSE FALSE
## [157,] FALSE FALSE FALSE FALSE FALSE
## [158,] FALSE FALSE FALSE FALSE FALSE
## [159,] FALSE FALSE FALSE FALSE FALSE
## [160,] FALSE FALSE FALSE FALSE FALSE
## [161,] FALSE FALSE FALSE FALSE FALSE
## [162,] FALSE FALSE FALSE FALSE FALSE
## [163,] FALSE FALSE FALSE FALSE FALSE
## [164,] FALSE FALSE FALSE FALSE FALSE
## [165,] FALSE FALSE FALSE FALSE FALSE
## [166,] FALSE FALSE FALSE FALSE FALSE
## [167,] FALSE FALSE FALSE FALSE FALSE
## [168,] FALSE FALSE FALSE FALSE FALSE
## [169,] FALSE FALSE FALSE FALSE FALSE
## [170,] FALSE FALSE FALSE FALSE FALSE
## [171,] FALSE FALSE FALSE FALSE FALSE
## [172,] FALSE FALSE FALSE FALSE FALSE
## [173,] FALSE FALSE FALSE FALSE FALSE
## [174,] FALSE FALSE FALSE FALSE FALSE
## [175,] FALSE FALSE FALSE FALSE FALSE
## [176,] FALSE FALSE FALSE FALSE FALSE
## [177,] FALSE FALSE FALSE FALSE FALSE
## [178,] FALSE FALSE FALSE FALSE FALSE
## [179,] FALSE FALSE FALSE FALSE FALSE
## [180,] FALSE FALSE FALSE FALSE FALSE
## [181,] FALSE FALSE FALSE FALSE FALSE
## [182,] FALSE FALSE FALSE FALSE FALSE
## [183,] FALSE FALSE FALSE FALSE FALSE
## [184,] FALSE FALSE FALSE FALSE FALSE
## [185,] FALSE FALSE FALSE FALSE FALSE
## [186,] FALSE FALSE FALSE FALSE FALSE
## [187,] FALSE FALSE FALSE FALSE FALSE
## [188,] FALSE FALSE FALSE FALSE FALSE
## [189,] FALSE FALSE FALSE FALSE FALSE
## [190,] FALSE FALSE FALSE FALSE FALSE
## [191,] FALSE FALSE FALSE FALSE FALSE
## [192,] FALSE FALSE FALSE FALSE FALSE
## [193,] FALSE FALSE FALSE FALSE FALSE
## [194,] FALSE FALSE FALSE FALSE FALSE
## [195,] FALSE FALSE FALSE FALSE FALSE
## [196,] FALSE FALSE FALSE FALSE FALSE
## [197,] FALSE FALSE FALSE FALSE FALSE
## [198,] FALSE FALSE FALSE FALSE FALSE
## [199,] FALSE FALSE FALSE FALSE FALSE
## [200,] FALSE FALSE FALSE FALSE FALSE
## [201,] FALSE FALSE FALSE FALSE FALSE
## [202,] FALSE FALSE FALSE FALSE FALSE
## [203,] FALSE FALSE FALSE FALSE FALSE
## [204,] FALSE FALSE FALSE FALSE FALSE
## [205,] FALSE FALSE FALSE FALSE FALSE
## [206,] FALSE FALSE FALSE FALSE FALSE
## [207,] FALSE FALSE FALSE FALSE FALSE
## [208,] FALSE FALSE FALSE FALSE FALSE
## [209,] FALSE FALSE FALSE FALSE FALSE
## [210,] FALSE FALSE FALSE FALSE FALSE
## [211,] FALSE FALSE FALSE FALSE FALSE
## [212,] FALSE FALSE FALSE FALSE FALSE
## [213,] FALSE FALSE FALSE FALSE FALSE
## [214,] FALSE FALSE FALSE FALSE FALSE
## [215,] FALSE FALSE FALSE FALSE FALSE
## [216,] FALSE FALSE FALSE FALSE FALSE
## [217,] FALSE FALSE FALSE FALSE FALSE
## [218,] FALSE FALSE FALSE FALSE FALSE
## [219,] FALSE FALSE FALSE FALSE FALSE
## [220,] FALSE FALSE FALSE FALSE FALSE
## [221,] FALSE FALSE FALSE FALSE FALSE
## [222,] FALSE FALSE FALSE FALSE FALSE
## [223,] FALSE FALSE FALSE FALSE FALSE
## [224,] FALSE FALSE FALSE FALSE FALSE
## [225,] FALSE FALSE FALSE FALSE FALSE
## [226,] FALSE FALSE FALSE FALSE FALSE
## [227,] FALSE FALSE FALSE FALSE FALSE
## [228,] FALSE FALSE FALSE FALSE FALSE
## [229,] FALSE FALSE FALSE FALSE FALSE
## [230,] FALSE FALSE FALSE FALSE FALSE
## [231,] FALSE FALSE FALSE FALSE FALSE
## [232,] FALSE FALSE FALSE FALSE FALSE
## [233,] FALSE FALSE FALSE FALSE FALSE
## [234,] FALSE FALSE FALSE FALSE FALSE
## [235,] FALSE FALSE FALSE FALSE FALSE
## [236,] FALSE FALSE FALSE FALSE FALSE
## [237,] FALSE FALSE FALSE FALSE FALSE
## [238,] FALSE FALSE FALSE FALSE FALSE
## [239,] FALSE FALSE FALSE FALSE FALSE
## [240,] FALSE FALSE FALSE FALSE FALSE
## [241,] FALSE FALSE FALSE FALSE FALSE
## [242,] FALSE FALSE FALSE FALSE FALSE
## [243,] FALSE FALSE FALSE FALSE FALSE
## [244,] FALSE FALSE FALSE FALSE FALSE
## [245,] FALSE FALSE FALSE FALSE FALSE
## [246,] FALSE FALSE FALSE FALSE FALSE
## [247,] FALSE FALSE FALSE FALSE FALSE
## [248,] FALSE FALSE FALSE FALSE FALSE
## [249,] FALSE FALSE FALSE FALSE FALSE
## [250,] FALSE FALSE FALSE FALSE FALSE
## [251,] FALSE FALSE FALSE FALSE FALSE
## [252,] FALSE FALSE FALSE FALSE FALSE
## [253,] FALSE FALSE FALSE FALSE FALSE
## [254,] FALSE FALSE FALSE FALSE FALSE
## [255,] FALSE FALSE FALSE FALSE FALSE
## [256,] FALSE FALSE FALSE FALSE FALSE
## [257,] FALSE FALSE FALSE FALSE FALSE
## [258,] FALSE FALSE FALSE FALSE FALSE
## [259,] FALSE FALSE FALSE FALSE FALSE
## [260,] FALSE FALSE FALSE FALSE FALSE
## [261,] FALSE FALSE FALSE FALSE FALSE
## [262,] FALSE FALSE FALSE FALSE FALSE
## [263,] FALSE FALSE FALSE FALSE FALSE
## [264,] FALSE FALSE FALSE FALSE FALSE
## [265,] FALSE FALSE FALSE FALSE FALSE
## [266,] FALSE FALSE FALSE FALSE FALSE
## [267,] FALSE FALSE FALSE FALSE FALSE
## [268,] FALSE FALSE FALSE FALSE FALSE
## [269,] FALSE FALSE FALSE FALSE FALSE
## [270,] FALSE FALSE FALSE FALSE FALSE
## [271,] FALSE FALSE FALSE FALSE FALSE
## [272,] FALSE FALSE FALSE FALSE FALSE
## [273,] FALSE FALSE FALSE FALSE FALSE
## [274,] FALSE FALSE FALSE FALSE FALSE
## [275,] FALSE FALSE FALSE FALSE FALSE
## [276,] FALSE FALSE FALSE FALSE FALSE
## Spores(M)/cm2 Time plated Time Incubated Temp timepoint infection.type
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE FALSE FALSE FALSE
## [32,] FALSE FALSE FALSE FALSE FALSE FALSE
## [33,] FALSE FALSE FALSE FALSE FALSE FALSE
## [34,] FALSE FALSE FALSE FALSE FALSE FALSE
## [35,] FALSE FALSE FALSE FALSE FALSE FALSE
## [36,] FALSE FALSE FALSE FALSE FALSE FALSE
## [37,] FALSE FALSE FALSE FALSE FALSE FALSE
## [38,] FALSE FALSE FALSE FALSE FALSE FALSE
## [39,] FALSE FALSE FALSE FALSE FALSE FALSE
## [40,] FALSE FALSE FALSE FALSE FALSE FALSE
## [41,] FALSE FALSE FALSE FALSE FALSE FALSE
## [42,] FALSE FALSE FALSE FALSE FALSE FALSE
## [43,] FALSE FALSE FALSE FALSE FALSE FALSE
## [44,] FALSE FALSE FALSE FALSE FALSE FALSE
## [45,] FALSE FALSE FALSE FALSE FALSE FALSE
## [46,] FALSE FALSE FALSE FALSE FALSE FALSE
## [47,] FALSE FALSE FALSE FALSE FALSE FALSE
## [48,] FALSE FALSE FALSE FALSE FALSE FALSE
## [49,] FALSE FALSE FALSE FALSE FALSE FALSE
## [50,] FALSE FALSE FALSE FALSE FALSE FALSE
## [51,] FALSE FALSE FALSE FALSE FALSE FALSE
## [52,] FALSE FALSE FALSE FALSE FALSE FALSE
## [53,] FALSE FALSE FALSE FALSE FALSE FALSE
## [54,] FALSE FALSE FALSE FALSE FALSE FALSE
## [55,] FALSE FALSE FALSE FALSE FALSE FALSE
## [56,] FALSE FALSE FALSE FALSE FALSE FALSE
## [57,] FALSE FALSE FALSE FALSE FALSE FALSE
## [58,] FALSE FALSE FALSE FALSE FALSE FALSE
## [59,] FALSE FALSE FALSE FALSE FALSE FALSE
## [60,] FALSE FALSE FALSE FALSE FALSE FALSE
## [61,] FALSE FALSE FALSE FALSE FALSE FALSE
## [62,] FALSE FALSE FALSE FALSE FALSE FALSE
## [63,] FALSE FALSE FALSE FALSE FALSE FALSE
## [64,] FALSE TRUE TRUE FALSE FALSE FALSE
## [65,] FALSE TRUE TRUE FALSE FALSE FALSE
## [66,] FALSE TRUE TRUE FALSE FALSE FALSE
## [67,] FALSE TRUE TRUE FALSE FALSE FALSE
## [68,] FALSE TRUE TRUE FALSE FALSE FALSE
## [69,] FALSE TRUE TRUE FALSE FALSE FALSE
## [70,] FALSE TRUE TRUE FALSE FALSE FALSE
## [71,] FALSE TRUE TRUE FALSE FALSE FALSE
## [72,] FALSE TRUE TRUE FALSE FALSE FALSE
## [73,] FALSE TRUE TRUE FALSE FALSE FALSE
## [74,] FALSE TRUE TRUE FALSE FALSE FALSE
## [75,] FALSE TRUE TRUE FALSE FALSE FALSE
## [76,] FALSE TRUE TRUE FALSE FALSE FALSE
## [77,] FALSE TRUE TRUE FALSE FALSE FALSE
## [78,] FALSE TRUE TRUE FALSE FALSE FALSE
## [79,] FALSE TRUE TRUE FALSE FALSE FALSE
## [80,] FALSE TRUE TRUE FALSE FALSE FALSE
## [81,] FALSE TRUE TRUE FALSE FALSE FALSE
## [82,] FALSE TRUE TRUE FALSE FALSE FALSE
## [83,] FALSE TRUE TRUE FALSE FALSE FALSE
## [84,] FALSE TRUE TRUE FALSE FALSE FALSE
## [85,] FALSE TRUE TRUE FALSE FALSE FALSE
## [86,] FALSE TRUE TRUE FALSE FALSE FALSE
## [87,] FALSE TRUE TRUE FALSE FALSE FALSE
## [88,] FALSE TRUE TRUE FALSE FALSE FALSE
## [89,] FALSE TRUE TRUE FALSE FALSE FALSE
## [90,] FALSE TRUE TRUE FALSE FALSE FALSE
## [91,] FALSE TRUE TRUE FALSE FALSE FALSE
## [92,] FALSE TRUE TRUE FALSE FALSE FALSE
## [93,] FALSE TRUE TRUE FALSE FALSE FALSE
## [94,] FALSE TRUE TRUE FALSE FALSE FALSE
## [95,] FALSE TRUE TRUE FALSE FALSE FALSE
## [96,] FALSE TRUE TRUE FALSE FALSE FALSE
## [97,] FALSE TRUE TRUE FALSE FALSE FALSE
## [98,] FALSE TRUE TRUE FALSE FALSE FALSE
## [99,] FALSE TRUE TRUE FALSE FALSE FALSE
## [100,] FALSE TRUE TRUE FALSE FALSE FALSE
## [101,] FALSE TRUE TRUE FALSE FALSE FALSE
## [102,] FALSE TRUE TRUE FALSE FALSE FALSE
## [103,] FALSE TRUE TRUE FALSE FALSE FALSE
## [104,] FALSE TRUE TRUE FALSE FALSE FALSE
## [105,] FALSE TRUE TRUE FALSE FALSE FALSE
## [106,] FALSE TRUE TRUE FALSE FALSE FALSE
## [107,] FALSE TRUE TRUE FALSE FALSE FALSE
## [108,] FALSE TRUE TRUE FALSE FALSE FALSE
## [109,] FALSE TRUE TRUE FALSE FALSE FALSE
## [110,] FALSE TRUE TRUE FALSE FALSE FALSE
## [111,] FALSE TRUE TRUE FALSE FALSE FALSE
## [112,] FALSE TRUE FALSE FALSE FALSE FALSE
## [113,] FALSE TRUE FALSE FALSE FALSE FALSE
## [114,] FALSE TRUE FALSE FALSE FALSE FALSE
## [115,] FALSE TRUE FALSE FALSE FALSE FALSE
## [116,] FALSE TRUE FALSE FALSE FALSE FALSE
## [117,] FALSE TRUE FALSE FALSE FALSE FALSE
## [118,] FALSE TRUE FALSE FALSE FALSE FALSE
## [119,] FALSE TRUE FALSE FALSE FALSE FALSE
## [120,] FALSE TRUE FALSE FALSE FALSE FALSE
## [121,] FALSE TRUE FALSE FALSE FALSE FALSE
## [122,] FALSE TRUE FALSE FALSE FALSE FALSE
## [123,] FALSE TRUE FALSE FALSE FALSE FALSE
## [124,] FALSE TRUE FALSE FALSE FALSE FALSE
## [125,] FALSE TRUE FALSE FALSE FALSE FALSE
## [126,] FALSE TRUE FALSE FALSE FALSE FALSE
## [127,] FALSE TRUE FALSE FALSE FALSE FALSE
## [128,] FALSE TRUE FALSE FALSE FALSE FALSE
## [129,] FALSE TRUE FALSE FALSE FALSE FALSE
## [130,] FALSE TRUE FALSE FALSE FALSE FALSE
## [131,] FALSE TRUE FALSE FALSE FALSE FALSE
## [132,] FALSE TRUE FALSE FALSE FALSE FALSE
## [133,] FALSE TRUE FALSE FALSE FALSE FALSE
## [134,] FALSE TRUE FALSE FALSE FALSE FALSE
## [135,] FALSE TRUE FALSE FALSE FALSE FALSE
## [136,] FALSE TRUE FALSE FALSE FALSE FALSE
## [137,] FALSE TRUE FALSE FALSE FALSE FALSE
## [138,] FALSE TRUE FALSE FALSE FALSE FALSE
## [139,] FALSE TRUE FALSE FALSE FALSE FALSE
## [140,] FALSE TRUE FALSE FALSE FALSE FALSE
## [141,] FALSE TRUE FALSE FALSE FALSE FALSE
## [142,] FALSE TRUE FALSE FALSE FALSE FALSE
## [143,] FALSE TRUE FALSE FALSE FALSE FALSE
## [144,] FALSE TRUE FALSE FALSE FALSE FALSE
## [145,] FALSE TRUE FALSE FALSE FALSE FALSE
## [146,] FALSE TRUE FALSE FALSE FALSE FALSE
## [147,] FALSE TRUE FALSE FALSE FALSE FALSE
## [148,] FALSE TRUE FALSE FALSE FALSE FALSE
## [149,] FALSE TRUE FALSE FALSE FALSE FALSE
## [150,] FALSE TRUE FALSE FALSE FALSE FALSE
## [151,] FALSE TRUE FALSE FALSE FALSE FALSE
## [152,] FALSE TRUE FALSE FALSE FALSE FALSE
## [153,] FALSE TRUE FALSE FALSE FALSE FALSE
## [154,] FALSE TRUE FALSE FALSE FALSE FALSE
## [155,] FALSE TRUE FALSE FALSE FALSE FALSE
## [156,] FALSE TRUE FALSE FALSE FALSE FALSE
## [157,] FALSE TRUE FALSE FALSE FALSE FALSE
## [158,] FALSE TRUE FALSE FALSE FALSE FALSE
## [159,] FALSE TRUE FALSE FALSE FALSE FALSE
## [160,] FALSE TRUE FALSE FALSE FALSE FALSE
## [161,] FALSE TRUE FALSE FALSE FALSE FALSE
## [162,] FALSE TRUE FALSE FALSE FALSE FALSE
## [163,] FALSE TRUE FALSE FALSE FALSE FALSE
## [164,] FALSE TRUE FALSE FALSE FALSE FALSE
## [165,] FALSE TRUE FALSE FALSE FALSE FALSE
## [166,] FALSE TRUE FALSE FALSE FALSE FALSE
## [167,] FALSE TRUE FALSE FALSE FALSE FALSE
## [168,] FALSE TRUE FALSE FALSE FALSE FALSE
## [169,] FALSE TRUE FALSE FALSE FALSE FALSE
## [170,] FALSE TRUE FALSE FALSE FALSE FALSE
## [171,] FALSE TRUE FALSE FALSE FALSE FALSE
## [172,] FALSE TRUE FALSE FALSE FALSE FALSE
## [173,] FALSE TRUE FALSE FALSE FALSE FALSE
## [174,] FALSE TRUE FALSE FALSE FALSE FALSE
## [175,] FALSE TRUE FALSE FALSE FALSE FALSE
## [176,] FALSE TRUE FALSE FALSE FALSE FALSE
## [177,] FALSE TRUE FALSE FALSE FALSE FALSE
## [178,] FALSE TRUE FALSE FALSE FALSE FALSE
## [179,] FALSE TRUE FALSE FALSE FALSE FALSE
## [180,] FALSE TRUE FALSE FALSE FALSE FALSE
## [181,] FALSE TRUE FALSE FALSE FALSE FALSE
## [182,] FALSE TRUE FALSE FALSE FALSE FALSE
## [183,] FALSE TRUE FALSE FALSE FALSE FALSE
## [184,] FALSE TRUE FALSE FALSE FALSE FALSE
## [185,] FALSE TRUE FALSE FALSE FALSE FALSE
## [186,] FALSE TRUE FALSE FALSE FALSE FALSE
## [187,] FALSE TRUE FALSE FALSE FALSE FALSE
## [188,] FALSE TRUE FALSE FALSE FALSE FALSE
## [189,] FALSE TRUE FALSE FALSE FALSE FALSE
## [190,] FALSE TRUE FALSE FALSE FALSE FALSE
## [191,] FALSE TRUE FALSE FALSE FALSE FALSE
## [192,] FALSE TRUE FALSE FALSE FALSE FALSE
## [193,] FALSE TRUE FALSE FALSE FALSE FALSE
## [194,] FALSE TRUE FALSE FALSE FALSE FALSE
## [195,] FALSE TRUE FALSE FALSE FALSE FALSE
## [196,] FALSE TRUE FALSE FALSE FALSE FALSE
## [197,] FALSE TRUE FALSE FALSE FALSE FALSE
## [198,] FALSE TRUE FALSE FALSE FALSE FALSE
## [199,] FALSE TRUE FALSE FALSE FALSE FALSE
## [200,] FALSE TRUE FALSE FALSE FALSE FALSE
## [201,] FALSE TRUE FALSE FALSE FALSE FALSE
## [202,] FALSE TRUE FALSE FALSE FALSE FALSE
## [203,] FALSE TRUE FALSE FALSE FALSE FALSE
## [204,] FALSE TRUE FALSE FALSE FALSE FALSE
## [205,] FALSE TRUE FALSE FALSE FALSE FALSE
## [206,] FALSE TRUE FALSE FALSE FALSE FALSE
## [207,] FALSE TRUE FALSE FALSE FALSE FALSE
## [208,] FALSE TRUE FALSE FALSE FALSE FALSE
## [209,] FALSE TRUE FALSE FALSE FALSE FALSE
## [210,] FALSE TRUE FALSE FALSE FALSE FALSE
## [211,] FALSE TRUE FALSE FALSE FALSE FALSE
## [212,] FALSE TRUE FALSE FALSE FALSE FALSE
## [213,] FALSE TRUE FALSE FALSE FALSE FALSE
## [214,] FALSE TRUE FALSE FALSE FALSE FALSE
## [215,] FALSE TRUE FALSE FALSE FALSE FALSE
## [216,] FALSE TRUE FALSE FALSE FALSE FALSE
## [217,] FALSE TRUE FALSE FALSE FALSE FALSE
## [218,] FALSE TRUE FALSE FALSE FALSE FALSE
## [219,] FALSE TRUE FALSE FALSE FALSE FALSE
## [220,] FALSE TRUE FALSE FALSE FALSE FALSE
## [221,] FALSE TRUE FALSE FALSE FALSE FALSE
## [222,] FALSE TRUE FALSE FALSE FALSE FALSE
## [223,] FALSE TRUE FALSE FALSE FALSE FALSE
## [224,] FALSE TRUE FALSE FALSE FALSE FALSE
## [225,] FALSE TRUE FALSE FALSE FALSE FALSE
## [226,] FALSE TRUE FALSE FALSE FALSE FALSE
## [227,] FALSE TRUE FALSE FALSE FALSE FALSE
## [228,] FALSE TRUE FALSE FALSE FALSE FALSE
## [229,] FALSE TRUE FALSE FALSE FALSE FALSE
## [230,] FALSE TRUE FALSE FALSE FALSE FALSE
## [231,] FALSE TRUE FALSE FALSE FALSE FALSE
## [232,] FALSE TRUE FALSE FALSE FALSE FALSE
## [233,] FALSE TRUE FALSE FALSE FALSE FALSE
## [234,] FALSE TRUE FALSE FALSE FALSE FALSE
## [235,] FALSE TRUE FALSE FALSE FALSE FALSE
## [236,] FALSE TRUE FALSE FALSE FALSE FALSE
## [237,] FALSE TRUE FALSE FALSE FALSE FALSE
## [238,] FALSE TRUE FALSE FALSE FALSE FALSE
## [239,] FALSE TRUE FALSE FALSE FALSE FALSE
## [240,] FALSE TRUE FALSE FALSE FALSE FALSE
## [241,] FALSE TRUE FALSE FALSE FALSE FALSE
## [242,] FALSE TRUE FALSE FALSE FALSE FALSE
## [243,] FALSE TRUE FALSE FALSE FALSE FALSE
## [244,] FALSE TRUE FALSE FALSE FALSE FALSE
## [245,] FALSE TRUE FALSE FALSE FALSE FALSE
## [246,] FALSE TRUE FALSE FALSE FALSE FALSE
## [247,] FALSE TRUE FALSE FALSE FALSE FALSE
## [248,] FALSE TRUE FALSE FALSE FALSE FALSE
## [249,] FALSE TRUE FALSE FALSE FALSE FALSE
## [250,] FALSE TRUE FALSE FALSE FALSE FALSE
## [251,] FALSE TRUE FALSE FALSE FALSE FALSE
## [252,] FALSE TRUE FALSE FALSE FALSE FALSE
## [253,] FALSE TRUE FALSE FALSE FALSE FALSE
## [254,] FALSE TRUE FALSE FALSE FALSE FALSE
## [255,] FALSE TRUE FALSE FALSE FALSE FALSE
## [256,] FALSE TRUE FALSE FALSE FALSE FALSE
## [257,] FALSE TRUE FALSE FALSE FALSE FALSE
## [258,] FALSE TRUE FALSE FALSE FALSE FALSE
## [259,] FALSE TRUE FALSE FALSE FALSE FALSE
## [260,] FALSE TRUE FALSE FALSE FALSE FALSE
## [261,] FALSE TRUE FALSE FALSE FALSE FALSE
## [262,] FALSE TRUE FALSE FALSE FALSE FALSE
## [263,] FALSE TRUE FALSE FALSE FALSE FALSE
## [264,] FALSE TRUE FALSE FALSE FALSE FALSE
## [265,] FALSE TRUE FALSE FALSE FALSE FALSE
## [266,] FALSE TRUE FALSE FALSE FALSE FALSE
## [267,] FALSE TRUE FALSE FALSE FALSE FALSE
## [268,] FALSE TRUE FALSE FALSE FALSE FALSE
## [269,] FALSE TRUE FALSE FALSE FALSE FALSE
## [270,] FALSE TRUE FALSE FALSE FALSE FALSE
## [271,] FALSE TRUE FALSE FALSE FALSE FALSE
## [272,] FALSE TRUE FALSE FALSE FALSE FALSE
## [273,] FALSE TRUE FALSE FALSE FALSE FALSE
## [274,] FALSE TRUE FALSE FALSE FALSE FALSE
## [275,] FALSE TRUE FALSE FALSE FALSE FALSE
## [276,] FALSE TRUE FALSE FALSE FALSE FALSE
## Fixing Date Location Staining Date Stain type Slide date Slide number
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE FALSE FALSE FALSE
## [32,] FALSE FALSE FALSE FALSE FALSE FALSE
## [33,] FALSE FALSE FALSE FALSE FALSE FALSE
## [34,] FALSE FALSE FALSE FALSE FALSE FALSE
## [35,] FALSE FALSE FALSE FALSE FALSE FALSE
## [36,] FALSE FALSE FALSE FALSE FALSE FALSE
## [37,] FALSE TRUE FALSE FALSE FALSE FALSE
## [38,] FALSE TRUE FALSE FALSE FALSE FALSE
## [39,] FALSE TRUE FALSE FALSE FALSE FALSE
## [40,] FALSE FALSE FALSE FALSE FALSE FALSE
## [41,] FALSE FALSE FALSE FALSE FALSE FALSE
## [42,] FALSE FALSE FALSE FALSE FALSE FALSE
## [43,] FALSE FALSE FALSE FALSE FALSE FALSE
## [44,] FALSE FALSE FALSE FALSE FALSE FALSE
## [45,] FALSE FALSE FALSE FALSE FALSE FALSE
## [46,] FALSE TRUE FALSE FALSE FALSE FALSE
## [47,] FALSE TRUE FALSE FALSE FALSE FALSE
## [48,] FALSE TRUE FALSE FALSE FALSE FALSE
## [49,] FALSE FALSE FALSE FALSE FALSE FALSE
## [50,] FALSE FALSE FALSE FALSE FALSE FALSE
## [51,] FALSE FALSE FALSE FALSE FALSE FALSE
## [52,] FALSE FALSE FALSE FALSE FALSE FALSE
## [53,] FALSE FALSE FALSE FALSE FALSE FALSE
## [54,] FALSE FALSE FALSE FALSE FALSE FALSE
## [55,] FALSE FALSE FALSE FALSE FALSE FALSE
## [56,] FALSE FALSE FALSE FALSE FALSE FALSE
## [57,] FALSE FALSE FALSE FALSE FALSE FALSE
## [58,] FALSE FALSE FALSE FALSE FALSE FALSE
## [59,] FALSE FALSE FALSE FALSE FALSE FALSE
## [60,] FALSE FALSE FALSE FALSE FALSE FALSE
## [61,] FALSE FALSE FALSE FALSE FALSE FALSE
## [62,] FALSE FALSE FALSE FALSE FALSE FALSE
## [63,] FALSE FALSE FALSE FALSE FALSE FALSE
## [64,] FALSE TRUE FALSE FALSE FALSE FALSE
## [65,] FALSE TRUE FALSE FALSE FALSE FALSE
## [66,] FALSE TRUE FALSE FALSE FALSE FALSE
## [67,] FALSE TRUE FALSE FALSE FALSE FALSE
## [68,] FALSE TRUE FALSE FALSE FALSE FALSE
## [69,] FALSE TRUE FALSE FALSE FALSE FALSE
## [70,] FALSE TRUE FALSE FALSE FALSE FALSE
## [71,] FALSE TRUE FALSE FALSE FALSE FALSE
## [72,] FALSE TRUE FALSE FALSE FALSE FALSE
## [73,] FALSE TRUE FALSE FALSE FALSE FALSE
## [74,] FALSE TRUE FALSE FALSE FALSE FALSE
## [75,] FALSE TRUE FALSE FALSE FALSE FALSE
## [76,] FALSE TRUE FALSE FALSE FALSE FALSE
## [77,] FALSE TRUE FALSE FALSE FALSE FALSE
## [78,] FALSE TRUE FALSE FALSE FALSE FALSE
## [79,] FALSE TRUE FALSE FALSE FALSE FALSE
## [80,] FALSE TRUE FALSE FALSE FALSE FALSE
## [81,] FALSE TRUE FALSE FALSE FALSE FALSE
## [82,] FALSE TRUE FALSE FALSE FALSE FALSE
## [83,] FALSE TRUE FALSE FALSE FALSE FALSE
## [84,] FALSE TRUE FALSE FALSE FALSE FALSE
## [85,] FALSE FALSE FALSE FALSE FALSE FALSE
## [86,] FALSE FALSE FALSE FALSE FALSE FALSE
## [87,] FALSE FALSE FALSE FALSE FALSE FALSE
## [88,] FALSE FALSE FALSE FALSE FALSE FALSE
## [89,] FALSE FALSE FALSE FALSE FALSE FALSE
## [90,] FALSE FALSE FALSE FALSE FALSE FALSE
## [91,] FALSE FALSE FALSE FALSE FALSE FALSE
## [92,] FALSE FALSE FALSE FALSE FALSE FALSE
## [93,] FALSE FALSE FALSE FALSE FALSE FALSE
## [94,] FALSE FALSE FALSE FALSE FALSE FALSE
## [95,] FALSE FALSE FALSE FALSE FALSE FALSE
## [96,] FALSE FALSE FALSE FALSE FALSE FALSE
## [97,] FALSE FALSE FALSE FALSE FALSE FALSE
## [98,] FALSE FALSE FALSE FALSE FALSE FALSE
## [99,] FALSE FALSE FALSE FALSE FALSE FALSE
## [100,] FALSE FALSE FALSE FALSE FALSE FALSE
## [101,] FALSE FALSE FALSE FALSE FALSE FALSE
## [102,] FALSE FALSE FALSE FALSE FALSE FALSE
## [103,] FALSE FALSE FALSE FALSE FALSE FALSE
## [104,] FALSE FALSE FALSE FALSE FALSE FALSE
## [105,] FALSE FALSE FALSE FALSE FALSE FALSE
## [106,] FALSE FALSE FALSE FALSE FALSE FALSE
## [107,] FALSE FALSE FALSE FALSE FALSE FALSE
## [108,] FALSE FALSE FALSE FALSE FALSE FALSE
## [109,] FALSE FALSE FALSE FALSE FALSE FALSE
## [110,] FALSE FALSE FALSE FALSE FALSE FALSE
## [111,] FALSE FALSE FALSE FALSE FALSE FALSE
## [112,] FALSE FALSE FALSE FALSE FALSE FALSE
## [113,] FALSE FALSE FALSE FALSE FALSE FALSE
## [114,] FALSE FALSE FALSE FALSE FALSE FALSE
## [115,] FALSE FALSE FALSE FALSE FALSE FALSE
## [116,] FALSE FALSE FALSE FALSE FALSE FALSE
## [117,] FALSE FALSE FALSE FALSE FALSE FALSE
## [118,] FALSE FALSE FALSE FALSE FALSE FALSE
## [119,] FALSE FALSE FALSE FALSE FALSE FALSE
## [120,] FALSE FALSE FALSE FALSE FALSE FALSE
## [121,] FALSE FALSE FALSE FALSE FALSE FALSE
## [122,] FALSE FALSE FALSE FALSE FALSE FALSE
## [123,] FALSE FALSE FALSE FALSE FALSE FALSE
## [124,] FALSE FALSE FALSE FALSE FALSE FALSE
## [125,] FALSE FALSE FALSE FALSE FALSE FALSE
## [126,] FALSE FALSE FALSE FALSE FALSE FALSE
## [127,] FALSE FALSE FALSE FALSE FALSE FALSE
## [128,] FALSE FALSE FALSE FALSE FALSE FALSE
## [129,] FALSE FALSE FALSE FALSE FALSE FALSE
## [130,] FALSE FALSE FALSE FALSE FALSE FALSE
## [131,] FALSE FALSE FALSE FALSE FALSE FALSE
## [132,] FALSE FALSE FALSE FALSE FALSE FALSE
## [133,] FALSE FALSE FALSE FALSE FALSE FALSE
## [134,] FALSE FALSE FALSE FALSE FALSE FALSE
## [135,] FALSE FALSE FALSE FALSE FALSE FALSE
## [136,] FALSE FALSE FALSE FALSE FALSE FALSE
## [137,] FALSE FALSE FALSE FALSE FALSE FALSE
## [138,] FALSE FALSE FALSE FALSE FALSE FALSE
## [139,] FALSE FALSE FALSE FALSE FALSE FALSE
## [140,] FALSE FALSE FALSE FALSE FALSE FALSE
## [141,] FALSE FALSE FALSE FALSE FALSE FALSE
## [142,] FALSE FALSE FALSE FALSE FALSE FALSE
## [143,] FALSE FALSE FALSE FALSE FALSE FALSE
## [144,] FALSE FALSE FALSE FALSE FALSE FALSE
## [145,] FALSE FALSE FALSE FALSE FALSE FALSE
## [146,] FALSE FALSE FALSE FALSE FALSE FALSE
## [147,] FALSE FALSE FALSE FALSE FALSE FALSE
## [148,] FALSE FALSE FALSE FALSE FALSE FALSE
## [149,] FALSE FALSE FALSE FALSE FALSE FALSE
## [150,] FALSE FALSE FALSE FALSE FALSE FALSE
## [151,] FALSE FALSE FALSE FALSE FALSE FALSE
## [152,] FALSE FALSE FALSE FALSE FALSE FALSE
## [153,] FALSE FALSE FALSE FALSE FALSE FALSE
## [154,] FALSE FALSE FALSE FALSE FALSE FALSE
## [155,] FALSE FALSE FALSE FALSE FALSE FALSE
## [156,] FALSE FALSE FALSE FALSE FALSE FALSE
## [157,] FALSE FALSE FALSE FALSE FALSE FALSE
## [158,] FALSE FALSE FALSE FALSE FALSE FALSE
## [159,] FALSE FALSE FALSE FALSE FALSE FALSE
## [160,] FALSE FALSE FALSE FALSE FALSE FALSE
## [161,] FALSE FALSE FALSE FALSE FALSE FALSE
## [162,] FALSE FALSE FALSE FALSE FALSE FALSE
## [163,] FALSE FALSE FALSE FALSE FALSE FALSE
## [164,] FALSE FALSE FALSE FALSE FALSE FALSE
## [165,] FALSE FALSE FALSE FALSE FALSE FALSE
## [166,] FALSE FALSE FALSE FALSE FALSE FALSE
## [167,] FALSE FALSE FALSE FALSE FALSE FALSE
## [168,] FALSE FALSE FALSE FALSE FALSE FALSE
## [169,] FALSE FALSE FALSE FALSE FALSE FALSE
## [170,] FALSE FALSE FALSE FALSE FALSE FALSE
## [171,] FALSE FALSE FALSE FALSE FALSE FALSE
## [172,] FALSE FALSE FALSE FALSE FALSE FALSE
## [173,] FALSE FALSE FALSE FALSE FALSE FALSE
## [174,] FALSE FALSE FALSE FALSE FALSE FALSE
## [175,] FALSE FALSE FALSE FALSE FALSE FALSE
## [176,] FALSE FALSE FALSE FALSE FALSE FALSE
## [177,] FALSE FALSE FALSE FALSE FALSE FALSE
## [178,] FALSE FALSE FALSE FALSE FALSE FALSE
## [179,] FALSE FALSE FALSE FALSE FALSE FALSE
## [180,] FALSE FALSE FALSE FALSE FALSE FALSE
## [181,] FALSE FALSE FALSE FALSE FALSE FALSE
## [182,] FALSE FALSE FALSE FALSE FALSE FALSE
## [183,] FALSE FALSE FALSE FALSE FALSE FALSE
## [184,] FALSE FALSE FALSE FALSE FALSE FALSE
## [185,] FALSE FALSE FALSE FALSE FALSE FALSE
## [186,] FALSE FALSE FALSE FALSE FALSE FALSE
## [187,] FALSE FALSE FALSE FALSE FALSE FALSE
## [188,] FALSE FALSE FALSE FALSE FALSE FALSE
## [189,] FALSE FALSE FALSE FALSE FALSE FALSE
## [190,] FALSE FALSE FALSE FALSE FALSE FALSE
## [191,] FALSE FALSE FALSE FALSE FALSE FALSE
## [192,] FALSE FALSE FALSE FALSE FALSE FALSE
## [193,] FALSE FALSE FALSE FALSE FALSE FALSE
## [194,] FALSE FALSE FALSE FALSE FALSE FALSE
## [195,] FALSE FALSE FALSE FALSE FALSE FALSE
## [196,] FALSE FALSE FALSE FALSE FALSE FALSE
## [197,] FALSE FALSE FALSE FALSE FALSE FALSE
## [198,] FALSE FALSE FALSE FALSE FALSE FALSE
## [199,] FALSE FALSE FALSE FALSE FALSE FALSE
## [200,] FALSE FALSE FALSE FALSE FALSE FALSE
## [201,] FALSE FALSE FALSE FALSE FALSE FALSE
## [202,] FALSE FALSE FALSE FALSE FALSE FALSE
## [203,] FALSE FALSE FALSE FALSE FALSE FALSE
## [204,] FALSE FALSE FALSE FALSE FALSE FALSE
## [205,] FALSE FALSE FALSE FALSE FALSE FALSE
## [206,] FALSE FALSE FALSE FALSE FALSE FALSE
## [207,] FALSE FALSE FALSE FALSE FALSE FALSE
## [208,] FALSE FALSE FALSE FALSE FALSE FALSE
## [209,] FALSE FALSE FALSE FALSE FALSE FALSE
## [210,] FALSE FALSE FALSE FALSE FALSE FALSE
## [211,] FALSE FALSE FALSE FALSE FALSE FALSE
## [212,] FALSE FALSE FALSE FALSE FALSE FALSE
## [213,] FALSE FALSE FALSE FALSE FALSE FALSE
## [214,] FALSE FALSE FALSE FALSE FALSE FALSE
## [215,] FALSE FALSE FALSE FALSE FALSE FALSE
## [216,] FALSE FALSE FALSE FALSE FALSE FALSE
## [217,] FALSE FALSE FALSE FALSE FALSE FALSE
## [218,] FALSE FALSE FALSE FALSE FALSE FALSE
## [219,] FALSE FALSE FALSE FALSE FALSE FALSE
## [220,] FALSE FALSE FALSE FALSE FALSE FALSE
## [221,] FALSE FALSE FALSE FALSE FALSE FALSE
## [222,] FALSE FALSE FALSE FALSE FALSE FALSE
## [223,] FALSE FALSE FALSE FALSE FALSE FALSE
## [224,] FALSE FALSE FALSE FALSE FALSE FALSE
## [225,] FALSE FALSE FALSE FALSE FALSE FALSE
## [226,] FALSE FALSE FALSE FALSE FALSE FALSE
## [227,] FALSE FALSE FALSE FALSE FALSE FALSE
## [228,] FALSE FALSE FALSE FALSE FALSE FALSE
## [229,] FALSE FALSE FALSE FALSE FALSE FALSE
## [230,] FALSE FALSE FALSE FALSE FALSE FALSE
## [231,] FALSE FALSE FALSE FALSE FALSE FALSE
## [232,] FALSE FALSE FALSE FALSE FALSE FALSE
## [233,] FALSE FALSE FALSE FALSE FALSE FALSE
## [234,] FALSE FALSE FALSE FALSE FALSE FALSE
## [235,] FALSE FALSE FALSE FALSE FALSE FALSE
## [236,] FALSE FALSE FALSE FALSE FALSE FALSE
## [237,] FALSE FALSE FALSE FALSE FALSE FALSE
## [238,] FALSE FALSE FALSE FALSE FALSE FALSE
## [239,] FALSE FALSE FALSE FALSE FALSE FALSE
## [240,] FALSE FALSE FALSE FALSE FALSE FALSE
## [241,] FALSE FALSE FALSE FALSE FALSE FALSE
## [242,] FALSE FALSE FALSE FALSE FALSE FALSE
## [243,] FALSE FALSE FALSE FALSE FALSE FALSE
## [244,] FALSE FALSE FALSE FALSE FALSE FALSE
## [245,] FALSE FALSE FALSE FALSE FALSE FALSE
## [246,] FALSE FALSE FALSE FALSE FALSE FALSE
## [247,] FALSE FALSE FALSE FALSE FALSE FALSE
## [248,] FALSE FALSE FALSE FALSE FALSE FALSE
## [249,] FALSE FALSE FALSE FALSE FALSE FALSE
## [250,] FALSE FALSE FALSE FALSE FALSE FALSE
## [251,] FALSE FALSE FALSE FALSE FALSE FALSE
## [252,] FALSE FALSE FALSE FALSE FALSE FALSE
## [253,] FALSE FALSE FALSE FALSE FALSE FALSE
## [254,] FALSE FALSE FALSE FALSE FALSE FALSE
## [255,] FALSE FALSE FALSE FALSE FALSE FALSE
## [256,] FALSE FALSE FALSE FALSE FALSE FALSE
## [257,] FALSE FALSE FALSE FALSE FALSE FALSE
## [258,] FALSE FALSE FALSE FALSE FALSE FALSE
## [259,] FALSE FALSE FALSE FALSE FALSE FALSE
## [260,] FALSE FALSE FALSE FALSE FALSE FALSE
## [261,] FALSE FALSE FALSE FALSE FALSE FALSE
## [262,] FALSE FALSE FALSE FALSE FALSE FALSE
## [263,] FALSE FALSE FALSE FALSE FALSE FALSE
## [264,] FALSE FALSE FALSE FALSE FALSE FALSE
## [265,] FALSE FALSE FALSE FALSE FALSE FALSE
## [266,] FALSE FALSE FALSE FALSE FALSE FALSE
## [267,] FALSE FALSE FALSE FALSE FALSE FALSE
## [268,] FALSE FALSE FALSE FALSE FALSE FALSE
## [269,] FALSE FALSE FALSE FALSE FALSE FALSE
## [270,] FALSE FALSE FALSE FALSE FALSE FALSE
## [271,] FALSE FALSE FALSE FALSE FALSE FALSE
## [272,] FALSE FALSE FALSE FALSE FALSE FALSE
## [273,] FALSE FALSE FALSE FALSE FALSE FALSE
## [274,] FALSE FALSE FALSE FALSE FALSE FALSE
## [275,] FALSE FALSE FALSE FALSE FALSE FALSE
## [276,] FALSE FALSE FALSE FALSE FALSE FALSE
## Slide Box Imaging Date
## [1,] FALSE FALSE
## [2,] FALSE FALSE
## [3,] FALSE FALSE
## [4,] FALSE FALSE
## [5,] FALSE FALSE
## [6,] FALSE FALSE
## [7,] FALSE FALSE
## [8,] FALSE FALSE
## [9,] FALSE FALSE
## [10,] FALSE FALSE
## [11,] FALSE FALSE
## [12,] FALSE FALSE
## [13,] FALSE FALSE
## [14,] FALSE FALSE
## [15,] FALSE FALSE
## [16,] FALSE FALSE
## [17,] FALSE FALSE
## [18,] FALSE FALSE
## [19,] FALSE FALSE
## [20,] FALSE FALSE
## [21,] FALSE FALSE
## [22,] FALSE FALSE
## [23,] FALSE FALSE
## [24,] FALSE FALSE
## [25,] FALSE FALSE
## [26,] FALSE FALSE
## [27,] FALSE FALSE
## [28,] FALSE FALSE
## [29,] FALSE FALSE
## [30,] FALSE FALSE
## [31,] FALSE FALSE
## [32,] FALSE FALSE
## [33,] FALSE FALSE
## [34,] FALSE FALSE
## [35,] FALSE FALSE
## [36,] FALSE FALSE
## [37,] FALSE FALSE
## [38,] FALSE FALSE
## [39,] FALSE FALSE
## [40,] FALSE FALSE
## [41,] FALSE FALSE
## [42,] FALSE FALSE
## [43,] FALSE FALSE
## [44,] FALSE FALSE
## [45,] FALSE FALSE
## [46,] FALSE FALSE
## [47,] FALSE FALSE
## [48,] FALSE FALSE
## [49,] FALSE FALSE
## [50,] FALSE FALSE
## [51,] FALSE FALSE
## [52,] FALSE FALSE
## [53,] FALSE FALSE
## [54,] FALSE FALSE
## [55,] FALSE FALSE
## [56,] FALSE FALSE
## [57,] FALSE FALSE
## [58,] FALSE FALSE
## [59,] FALSE FALSE
## [60,] FALSE FALSE
## [61,] FALSE FALSE
## [62,] FALSE FALSE
## [63,] FALSE FALSE
## [64,] FALSE FALSE
## [65,] FALSE FALSE
## [66,] FALSE FALSE
## [67,] FALSE FALSE
## [68,] FALSE FALSE
## [69,] FALSE FALSE
## [70,] FALSE FALSE
## [71,] FALSE FALSE
## [72,] FALSE FALSE
## [73,] FALSE FALSE
## [74,] FALSE FALSE
## [75,] FALSE FALSE
## [76,] FALSE FALSE
## [77,] FALSE FALSE
## [78,] FALSE FALSE
## [79,] FALSE FALSE
## [80,] FALSE FALSE
## [81,] FALSE FALSE
## [82,] FALSE FALSE
## [83,] FALSE FALSE
## [84,] FALSE FALSE
## [85,] FALSE FALSE
## [86,] FALSE FALSE
## [87,] FALSE FALSE
## [88,] FALSE FALSE
## [89,] FALSE FALSE
## [90,] FALSE FALSE
## [91,] FALSE FALSE
## [92,] FALSE FALSE
## [93,] FALSE FALSE
## [94,] FALSE FALSE
## [95,] FALSE FALSE
## [96,] FALSE FALSE
## [97,] FALSE FALSE
## [98,] FALSE FALSE
## [99,] FALSE FALSE
## [100,] FALSE FALSE
## [101,] FALSE FALSE
## [102,] FALSE FALSE
## [103,] FALSE FALSE
## [104,] FALSE FALSE
## [105,] FALSE FALSE
## [106,] FALSE FALSE
## [107,] FALSE FALSE
## [108,] FALSE FALSE
## [109,] FALSE FALSE
## [110,] FALSE FALSE
## [111,] FALSE FALSE
## [112,] FALSE FALSE
## [113,] FALSE FALSE
## [114,] FALSE FALSE
## [115,] FALSE FALSE
## [116,] FALSE FALSE
## [117,] FALSE FALSE
## [118,] FALSE FALSE
## [119,] FALSE FALSE
## [120,] FALSE FALSE
## [121,] FALSE FALSE
## [122,] FALSE FALSE
## [123,] FALSE FALSE
## [124,] FALSE FALSE
## [125,] FALSE FALSE
## [126,] FALSE FALSE
## [127,] FALSE FALSE
## [128,] FALSE FALSE
## [129,] FALSE FALSE
## [130,] FALSE FALSE
## [131,] FALSE FALSE
## [132,] FALSE FALSE
## [133,] FALSE FALSE
## [134,] FALSE FALSE
## [135,] FALSE FALSE
## [136,] FALSE FALSE
## [137,] FALSE FALSE
## [138,] FALSE FALSE
## [139,] FALSE FALSE
## [140,] FALSE FALSE
## [141,] FALSE FALSE
## [142,] FALSE FALSE
## [143,] FALSE FALSE
## [144,] FALSE FALSE
## [145,] FALSE FALSE
## [146,] FALSE FALSE
## [147,] FALSE FALSE
## [148,] FALSE FALSE
## [149,] FALSE FALSE
## [150,] FALSE FALSE
## [151,] FALSE FALSE
## [152,] FALSE FALSE
## [153,] FALSE FALSE
## [154,] FALSE FALSE
## [155,] FALSE FALSE
## [156,] FALSE FALSE
## [157,] FALSE FALSE
## [158,] FALSE FALSE
## [159,] FALSE FALSE
## [160,] FALSE FALSE
## [161,] FALSE FALSE
## [162,] FALSE FALSE
## [163,] FALSE FALSE
## [164,] FALSE FALSE
## [165,] FALSE FALSE
## [166,] FALSE FALSE
## [167,] FALSE FALSE
## [168,] FALSE FALSE
## [169,] FALSE FALSE
## [170,] FALSE FALSE
## [171,] FALSE FALSE
## [172,] FALSE FALSE
## [173,] FALSE FALSE
## [174,] FALSE FALSE
## [175,] FALSE FALSE
## [176,] FALSE FALSE
## [177,] FALSE FALSE
## [178,] FALSE FALSE
## [179,] FALSE FALSE
## [180,] FALSE FALSE
## [181,] FALSE FALSE
## [182,] FALSE FALSE
## [183,] FALSE FALSE
## [184,] FALSE FALSE
## [185,] FALSE FALSE
## [186,] FALSE FALSE
## [187,] FALSE FALSE
## [188,] FALSE FALSE
## [189,] FALSE FALSE
## [190,] FALSE FALSE
## [191,] FALSE FALSE
## [192,] FALSE FALSE
## [193,] FALSE FALSE
## [194,] FALSE FALSE
## [195,] FALSE FALSE
## [196,] FALSE FALSE
## [197,] FALSE FALSE
## [198,] FALSE FALSE
## [199,] FALSE FALSE
## [200,] FALSE FALSE
## [201,] FALSE FALSE
## [202,] FALSE FALSE
## [203,] FALSE FALSE
## [204,] FALSE FALSE
## [205,] FALSE FALSE
## [206,] FALSE FALSE
## [207,] FALSE FALSE
## [208,] FALSE FALSE
## [209,] FALSE FALSE
## [210,] FALSE FALSE
## [211,] FALSE FALSE
## [212,] FALSE FALSE
## [213,] FALSE FALSE
## [214,] FALSE FALSE
## [215,] FALSE FALSE
## [216,] FALSE FALSE
## [217,] FALSE FALSE
## [218,] FALSE FALSE
## [219,] FALSE FALSE
## [220,] FALSE FALSE
## [221,] FALSE FALSE
## [222,] FALSE FALSE
## [223,] FALSE FALSE
## [224,] FALSE FALSE
## [225,] FALSE FALSE
## [226,] FALSE FALSE
## [227,] FALSE FALSE
## [228,] FALSE FALSE
## [229,] FALSE FALSE
## [230,] FALSE FALSE
## [231,] FALSE FALSE
## [232,] FALSE FALSE
## [233,] FALSE FALSE
## [234,] FALSE FALSE
## [235,] FALSE FALSE
## [236,] FALSE FALSE
## [237,] FALSE FALSE
## [238,] FALSE FALSE
## [239,] FALSE FALSE
## [240,] FALSE FALSE
## [241,] FALSE FALSE
## [242,] FALSE FALSE
## [243,] FALSE FALSE
## [244,] FALSE FALSE
## [245,] FALSE FALSE
## [246,] FALSE FALSE
## [247,] FALSE FALSE
## [248,] FALSE FALSE
## [249,] FALSE FALSE
## [250,] FALSE FALSE
## [251,] FALSE FALSE
## [252,] FALSE FALSE
## [253,] FALSE FALSE
## [254,] FALSE FALSE
## [255,] FALSE FALSE
## [256,] FALSE FALSE
## [257,] FALSE FALSE
## [258,] FALSE FALSE
## [259,] FALSE FALSE
## [260,] FALSE FALSE
## [261,] FALSE FALSE
## [262,] FALSE FALSE
## [263,] FALSE FALSE
## [264,] FALSE FALSE
## [265,] FALSE FALSE
## [266,] FALSE FALSE
## [267,] FALSE FALSE
## [268,] FALSE FALSE
## [269,] FALSE FALSE
## [270,] FALSE FALSE
## [271,] FALSE FALSE
## [272,] FALSE FALSE
## [273,] FALSE FALSE
## [274,] FALSE FALSE
## [275,] FALSE FALSE
## [276,] FALSE FALSE
any() function evaluates logical vectorsIn the case of large data frames, as you can see there are just too
many entries to identify. Sometimes we are just interested in knowing if
at least one of our logical values matches to TRUE. That is
accomplished using the any() function which can evaluate
multiple vectors (or data.frames), answering which of those
has at least one TRUE value.
We can use it to quickly ask if our infection_meta.tbl
data frame has any NA values.
# Before we dig too deep, can we check if there are ANY NA values in our data.frame?
any(is.na(infection_meta.tbl)) # logical (TRUE or FALSE).
## [1] TRUE
Now we’ve confirmed that there is at least a single NA
value in our data. Given that there are 276 rows with 29 columns (8004
total entries), we need to find a way to identify which rows contain
NA values and conversely those without NA
values. Let’s start with simple structures.
which()
functionUsing is.na() we were returned a size-matched logical
structure of whether or not a value was NA. There are some ways we can
apply this information through different functions (as we saw with the
any() function) but a useful method applicable to a vector
of logicals is to ask which() indices (positions)
return TRUE.
In our case, we use which() after checking for
NA values in our object.
# Take a look at na_vector before you start manipulating it
na_vector
## [1] 5 6 NA 7 7 NA
# wrap which() around our is.na() call
which(is.na(na_vector))
## [1] 3 6
# save the indices where NAs are present in na_vector
na_positions <- which(is.na(na_vector))
From above, we see that our NA values are located at
indices 3 and 6!
which() to filter your
data!Now that we have the results from our which() call, we
know exactly which indices have NA values. We can apply
this directly to our original na_vector object to retrieve
the non-NA values using the - (exclusion)
syntax.
# cut out the na_values indices
removed_na_vector_1 <- na_vector[-na_positions]
# Check out the result
removed_na_vector_1
## [1] 5 6 7 7
!, to invert your
logical vectorsSomething we haven’t yet discussed in great detail is boolean logic.
We’ll see more in later lectures but one very helpful symbol is
! which is also known as the logical NOT.
In essence this will take in a logical value or group of logical values
and switch them from TRUE to FALSE and
vice versa.
As we mentioned in Lecture 01 you can index your
data structures with a series of logicals TRUE for select,
FALSE for exclude. We also know that is.na()
produces a vector of logical values matching the indices of your input
object. We can take this to the next level by combining the
logical NOT with our is.na() results. This
has the added bonus of avoiding the creation of an extra variable!
Let’s revisit this idea with our na_vector.
# Which values are NA?
is.na(na_vector)
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
# Flip the logical result
!is.na(na_vector)
## [1] TRUE TRUE FALSE TRUE TRUE FALSE
# Apply this in our code for conditional indexing
# indexing using a size-matched logical vector
removed_na_vector_2 <- na_vector[!is.na(na_vector)] ; removed_na_vector_2
## [1] 5 6 7 7
# compare to using which() to index by position
na_vector[-which(is.na(na_vector))]
## [1] 5 6 7 7
Conditional indexing: That’s right! We just used conditional indexing in the above section to remove NA values from our na_vector. A data structure of booleans (TRUE and FALSE) can be used to select elements from within another data structure, as long as the relevant dimensions match! This becomes extremely relevant when we begin to filter our data frames based on specific criteria.
NA values within our
tibble or data.frame?We’ve been using a lot of examples with simple and small data
structures but the infection_meta.tbl as we saw in section
3.2.3 was much harder to view. That’s where the proper
use of which() can come in quite handy. Let’s see how it
works in direct usage.
# Which values in infection_meta.tbl are NA? Recall we have 276 rows of data!
which(is.na(infection_meta.tbl))
## [1] 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494
## [16] 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509
## [31] 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524
## [46] 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539
## [61] 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554
## [76] 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569
## [91] 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584
## [106] 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599
## [121] 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614
## [136] 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629
## [151] 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644
## [166] 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659
## [181] 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674
## [196] 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689
## [211] 4690 4691 4692 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767
## [226] 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782
## [241] 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797
## [256] 4798 4799 4800 4801 4802 4803 6109 6110 6111 6118 6119 6120 6136 6137 6138
## [271] 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153
## [286] 6154 6155 6156
What do those values even mean? We are essentially seeing the
positions of each NA element in infection_meta.tbl where
the indices are assigned from top to bottom and then left to right. Thus
1-276 are values from column 1, 277-552 belong to column 2, etc.
complete.cases() to query larger
objectsWe have verified in many ways that we have at least one
NA value in counts. Often we may wish to drop incomplete
observations where one or more variables is lacking data. Using the
which() function would be helpful but, as we can see from
our above example, it only returns the element
order for the entire data.frame. Instead, we want
to look for rows that have
any NA values. If you were only
concerned with NA values in a specific column of
your dataframe, which() would be a good way to accomplish
your task.
In the case of removing any incomplete rows, the function
complete.cases() looks by row to
see whether any row contains an NA and returns a logical vector with
each entry representing a row within the dataframe. You can then subset
out the rows containing any NAs using conditional
indexing.
# ?complete.cases
# Outputs a logical vector specifying which observations/rows have no missing values across the entire sequence.
head(complete.cases(infection_meta.tbl), 20)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE
# Use it wisely to keep complete rows. Pop quiz [x,y] will we index by x or by y?
str(infection_meta.tbl[complete.cases(infection_meta.tbl),])
## tibble [57 x 29] (S3: tbl_df/tbl/data.frame)
## $ experiment : chr [1:57] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
## $ experimenter : chr [1:57] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:57] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:57] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:57] 1 2 3 4 5 6 7 8 9 10 ...
## $ Worm_strain : chr [1:57] "VC20019" "VC20019" "VC20019" "N2" ...
## $ Total Worms : num [1:57] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:57] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:57] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:57] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:57] 0 10 20 0 10 20 0 10 20 0 ...
## $ Total ul spore : num [1:57] 0 56.8 113.6 0 56.8 ...
## $ Infection Round : num [1:57] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:57] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:57] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:57] 0 0.354 0.708 0 0.354 ...
## $ Time plated : num [1:57] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time Incubated : num [1:57] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : num [1:57] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:57] "72" "72" "72" "72" ...
## $ infection.type : chr [1:57] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:57] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:57] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:57] 190513 190513 190513 190430 190513 ...
## $ Stain type : chr [1:57] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
## $ Slide date : num [1:57] 190515 190515 190515 190501 190515 ...
## $ Slide number : num [1:57] 1 2 3 4 5 6 7 8 9 10 ...
## $ Slide Box : num [1:57] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:57] 190516 190516 190516 190502 190516 ...
Use the any() function to identify if any values (ie at least one) in a logical vector or expression evaluates to true! This function also returns a single logical value. This can be a very handy tool when you’re concerned more with completeness rather than individual values.
which() with apply() to
find where data might be missingHold up! We just removed our incomplete cases and went from 276
observations to a measly 57! Before we lose over 200 rows of our data,
maybe we can take a quick look at where our
NA values are located. Sometimes it could exist in just a
small number of columns that don’t really have much importance.
Now that we have a few tools under our belt, let’s figure out
which() columns have any() values which are
NA in our dataset. To do this, we’ll rope in the
apply() function to help us loop through each column
individually as well.
# Use the apply function to find columns with NA and then determine which columns return TRUE
which(apply(infection_meta.tbl,
MARGIN = 2, # Use the columns
function(x) any(is.na(x)) # Here's our function to examine each column for NA values
) # end apply
) # end which
## Time plated Time Incubated Location
## 17 18 23
We can see from the results of our code, that we really just have NA
values in 3 metadata columns: Time plated,
Time Incubated, and Location. What do you
think the integer values in the resulting vector represent?
Use the combined anyNA() function to shortcut the use of the two functions any() and is.na(). You can use the anyNA() function to ask the same question as two! You can play with the code above to replace the function used in apply() with the anyNA().
NAs with something
usefulDepending on your data or situation, you may want to include rows
(observations) even though some aspects may be incomplete. Instead,
consider replacing NAs in your data set. This could be
replacement with a sample average, or the mode of the data, or a value
that is below a threshold.
# Replace the NA values in our table under the "Location" column.
# Note that this will permanently change our tibble!
infection_meta.tbl[is.na(infection_meta.tbl$Location), ]$Location <- "None"
# Check which columns have NA values now
which(apply(infection_meta.tbl, MARGIN = 2,
function(x) anyNA(x))) # Notice our use of the anyNA() function this time?
## Time plated Time Incubated
## 17 18
More about NA values: To learn about a few more functions that you can use to identify and remove NA values from your data structure, check out the Appendix at the end of this lecture.
Comprehension Question 3.0.0: Replace the NA values in Time plated and Time Incubated with the values 1300 and 1600 respectively. You can do this by completing the skeleton code below where we’ll make a copy of the tibble to work on called “comprehension_meta.tbl”. Check for NA values afterwards!
# comprehension answer code 3.0.0
# Copy our table over to new version
comprehension_meta.tbl <- infection_meta.tbl
# Fix the Time plated column
comprehension_meta.tbl[...), ]$... <- 1300
# Fix the Time Incubated column
comprehension_meta.tbl[...), ]$... <- 1600
# Will we have any NA values left?
any(is.na(comprehension_meta.tbl))
## Error: <text>:7:27: unexpected ')'
## 6: # Fix the Time plated column
## 7: comprehension_meta.tbl[...)
## ^
dplyr (DEE ply er)
packageNow that we’ve inspected our data for various pitfalls, we can move
on to filtering and sorting. Before we answer any questions with our
data, we need the ability to select and filter parts of our data. This
can be accomplished with base functions in R, but the dplyr
package provides a more human-readable syntax.
Image courtesy of xkcd at https://xkcd.com/1906/
The dplyr package was made by Hadley Wickham to help
make data frame manipulation easier. There are 5 major types of
functions that we are concerned with in today’s lecture:
filter() - subsets your data.frame by rowselect() - subsets your data.frame by columnsarrange() - orders your data.frame alphabetically or
numerically by ascending or descending variablesmutate(), transmute() - create a new column of
datasummarize() or summarise() - reduces data
to summary values (for example using mean(),
sd(), min(), quantile(),
etc)There’s always more to explore (in dplyr)! Although we are focused on just a handful of dplyr functions, we will end up exploring some more as time goes by. The tidyverse packages actually have a very comprehensive set of web pages full of descriptions and examples for most of the functions in each tidyverse package. You can find the dplyr function page here.
It is often extremely useful to subset your data by some logical condition. We’ve seen some examples above where we used functions and code to identify and keep specific rows using conditional indexing. Let’s dig deeper into that topic.
Conditionals ask a question about one or more values and return a
logical (TRUE or FALSE) result. Here’s a quick
table breaking down the uses of basic conditional statements.
| Logical operator | Meaning | Example | Result |
|---|---|---|---|
== |
value equivalence (ie equal to) | “this” == “that” | FALSE |
!= |
not equal to | 4 != 5 | TRUE |
> |
greater than | 4 > 5 | FALSE |
>= |
greater than or equal to | 4 >= 5 | FALSE |
< |
less than | 4 < 5 | TRUE |
<= |
less than or equal to | 4 <= 5 | TRUE |
Cautionary Note: == may also return
TRUE for NA values in your comparison
Mastering the meaning and use of these logical operators will go a long way to helping you in your data science journey!
%in%, to compare
setsSometimes the simplest kind of conditional can be thought of as
comparing two sets of data. Which values in set A exist in
set B? As an example from our current dataset, we may want
to keep all rows that have either N2 OR
JU1400 in the Worm_strain column.
To accomplish this using basic functions in R, we turn to the match
binary operator, %in%, which can ask for us “does
x contain any elements present in y” using the
syntax x %in% y. This operator usually returns a logical
vector matching the size of x with TRUE values if the
element from x is in y. Note that the size of
x and y need not be identical!
Let’s see what that looks like in the context of our above question.
# Find out more about the match operator by using double quotes
# ?"%in%"
# What does %in% return?
str(infection_meta.tbl$Worm_strain %in% c("N2", "JU1400"))
## logi [1:276] FALSE FALSE FALSE TRUE TRUE TRUE ...
# You can filter your data using basic R commands
# Use the conditional result to index our data.frame
head(infection_meta.tbl[infection_meta.tbl$Worm_strain %in% c("N2", "JU1400"),])
## [90m# A tibble: 6 x 29[39m
## experiment experimenter description `Infection Date` `Plate Number`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_N2_LUAm1_0M_7~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 4
## [90m2[39m 190426_N2_LUAm1_10M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 5
## [90m3[39m 190426_N2_LUAm1_20M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 6
## [90m4[39m 190426_JU1400_LUAm1_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 25
## [90m5[39m 190426_JU1400_LUAm1_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 26
## [90m6[39m 190426_JU1400_LUAm1_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 27
## [90m# i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,[39m
## [90m# `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,[39m
## [90m# `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,[39m
## [90m# `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,[39m
## [90m# `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,[39m
## [90m# infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,[39m
## [90m# `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...[39m
# how many rows (entries) do we find with our query?
nrow(infection_meta.tbl[infection_meta.tbl$Worm_strain %in% c("N2", "JU1400"),])
## [1] 151
# A near-equivalent command using the logical OR
# This, however, is a cautionary example about filtering your data. Watch out for this command!
nrow(infection_meta.tbl[(infection_meta.tbl$Worm_strain == "N2" | infection_meta.tbl$Worm_strain == "JU1400" ),])
## [1] 151
# The above command will also return any entries with NA in your filtered criteria.
# Remember where we saw the `Time plated` column in our previous coding cells?
nrow(infection_meta.tbl[(infection_meta.tbl$`Time plated` == 1300),])
## [1] 276
From our above output, we see that our first filtering step did
return 151 rows of data as expected. However, we know that our
Time plated column definitely has some NA
values but we are returned all 276 rows, indicating that the
NA values have been kept!
filter() function to replicate
%in% and more!From our query above we already know we were asking R to search
through our data frame under the Worm_strain column for any
matches to N2 OR JU1400. The
notation, however, can be a little confusing whereas the
filter() function can accomplish the same task in a more
human-readable syntax.
Using the filter() function we can evaluate each row
with our criteria. Our first argument will be our
data.frame, followed by the information for the rows we
want to subset by. Parameters we are interested in are:
.data: A data frame or data frame extension (e.g. a
tibble)...: Expressions that can return a logical value based
on the variables within .dataNotably, filter() drops any NA rows/values
that might result from our comparisons. Why is that important?
How do we go about constructing expressions for this function? Let’s give it a try!
# But the syntax using filter is much more human readable
filter(infection_meta.tbl,
Worm_strain == "N2" & Worm_strain == "JU1400")
## [90m# A tibble: 0 x 29[39m
## [90m# i 29 variables: experiment <chr>, experimenter <chr>, description <chr>,[39m
## [90m# Infection Date <dbl>, Plate Number <dbl>, Worm_strain <chr>,[39m
## [90m# Total Worms <dbl>, Spore Strain <chr>, Spore Lot <chr>,[39m
## [90m# Lot concentration <dbl>, Total Spores (M) <dbl>, Total ul spore <dbl>,[39m
## [90m# Infection Round <dbl>, 40X OP50 (mL) <dbl>, Plate Size <dbl>,[39m
## [90m# Spores(M)/cm2 <dbl>, Time plated <dbl>, Time Incubated <dbl>, Temp <dbl>,[39m
## [90m# timepoint <chr>, infection.type <chr>, Fixing Date <dbl>, ...[39m
Uh oh! Our code produced an empty tibble because we used the logical
operator & (AND). For us it makes sense to want only N2
AND JU1400, but to R it won’t make sense because a worm
strain can’t be both N2 AND JU1400 at the same time.
That’s why we need to use the | (OR) operator to select
everything that is N2 OR JU1400. Here’s a handy summary
about the remaining logical operators.
| Operator | Description | Use or Result |
|---|---|---|
| ! | Logical NOT | Converts logical results into their opposite |
| & | Element-wise logical AND | Perform element-wise AND; the result length matches that of the longer operand |
| && | Logical AND | Examines only the first element of the operands resulting in a single length logical vector ** |
| | | Element-wise logical OR | Perform element-wise OR; the result length matches that of the longer operand |
| || | Logical OR | Examines only the first element of the operands resulting into a single length logical vector ** |
** As of R 4.3.0, this will only compare single-length logical values
Logical operators: To summarize for “&” it will return TRUE if all elements in that single comparison are TRUE while “|” will return TRUE if any elements in that single comparison are TRUE. This logic is applied between index-matched elements and can be combined into more complex statements!
| Logical statement | Evaluation |
|---|---|
| TRUE & TRUE | TRUE |
| TRUE & FALSE, FALSE & TRUE, FALSE & FALSE | FALSE |
| TRUE & TRUE & FALSE | FALSE |
| TRUE | TRUE, TRUE | FALSE, FALSE | TRUE | TRUE |
| FALSE | FALSE | FALSE |
| TRUE | TRUE |
Now, let’s try that filter() command again.
# Filter infection_meta.tbl using the proper logical operator
nrow(filter(infection_meta.tbl,
Worm_strain == "N2" | Worm_strain == "JU1400"))
## [1] 151
#Will this work?
nrow(filter(infection_meta.tbl, Worm_strain == c("N2", "JU1400")))
## [1] 88
What happened with our above command? Why did it return only 88 rows? To be honest, it was lucky that the operation worked at all! When R encounters operations between vectors of different size, it will recycle the shorter of the vectors when it can. We briefly discussed this idea in lecture 01 section 3.2.4.2 where we saw vector recycling happening with our matrix creation.
Here’s an example
c(1,2,3) + c(10,11)
## Warning in c(1, 2, 3) + c(10, 11): longer object length is not a multiple of
## shorter object length
## [1] 11 13 13
In this case, R gave us a warning that our vectors don’t match in length. It returned to us a vector of length 3 (our longest vector), and it recycled the 10 from the shorter vector to add to the 3. See the below table for clarification.
| first value | second value | result |
|---|---|---|
| 1 | 10 | 11 |
| 2 | 11 | 13 |
| 3 | 10 | 13 |
R will assume that you know what you are doing as long as one of your vector lengths is a multiple of your other vector length. Here the shorter vector is recycled twice. No warning is given.
# What happens if we increase the length of our first vector?
c(1,2,3,4) + c(10,11)
## [1] 11 13 13 15
%in%, instead of ==Going back to our broken code:
nrow(filter(infection_meta.tbl, Worm_strain == c("N2", "JU1400")))
while well intentioned was basically saying “filter for
odd rows where
Worm_strain == "N2" AND even rows
where Worm_strain == "JU1400".
Recall that %in% is a binary match operator that says
“for each element in Worm_strain, does that element exist
in the vector c("N2", "JU1400")?”
# Use the correct operator to get the job done when filtering with vectors
nrow(filter(infection_meta.tbl,
Worm_strain %in% c("N2", "JU1400")))
## [1] 151
filter() to identify matching candidates with
criteria across multiple variablesWe just filtered for multiple worm strains (multiple rows based on
the identity of values in a single column). However, you can also filter
for rows based on values in multiple columns.
We can do this from basic principles too but this is where the
filter() function really shines as it keeps the query
language quite clear for us and others to read and interpret.
Operator precedence: Before we jump in there, we should quickly note that there is an order or precedence for groups of logical operators. The more “mathematical” operators will be evaluated before logical operators that compare by combining logical values (ie & and |). You can use parentheses () to separate or control the order of lower precedence operations. Find out more in the R manual.
For example, you can use the following filtering combinations:
# Query for samples of either Worm strain N2 OR Spore Strain ERTm5
head(filter(infection_meta.tbl,
Worm_strain == "N2" | `Spore Strain` == "ERTm5"))
## [90m# A tibble: 6 x 29[39m
## experiment experimenter description `Infection Date` `Plate Number`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_N2_LUAm1_0M_7~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 4
## [90m2[39m 190426_N2_LUAm1_10M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 5
## [90m3[39m 190426_N2_LUAm1_20M_~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 6
## [90m4[39m 190426_VC20019_ERTm5~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 34
## [90m5[39m 190426_VC20019_ERTm5~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 35
## [90m6[39m 190426_VC20019_ERTm5~ CM Wild isola~ [4m1[24m[4m9[24m[4m0[24m423 36
## [90m# i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,[39m
## [90m# `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,[39m
## [90m# `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,[39m
## [90m# `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,[39m
## [90m# `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,[39m
## [90m# infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,[39m
## [90m# `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...[39m
# == means "is exactly".
# Query for rows with Plate Size = 6 and Spores(M)/cm2 = 0
str(filter(infection_meta.tbl,
`Plate Size` == 6 & `Spores(M)/cm2` == 0),
give.attr = FALSE)
## spc_tbl_ [77 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:77] "190426_VC20019_LUAm1_0M_72hpi" "190426_N2_LUAm1_0M_72hpi" "190426_AB1_LUAm1_0M_72hpi" "190426_JU397_LUAm1_0M_72hpi" ...
## $ experimenter : chr [1:77] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:77] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:77] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:77] 1 4 7 10 13 16 19 22 25 28 ...
## $ Worm_strain : chr [1:77] "VC20019" "N2" "AB1" "JU397" ...
## $ Total Worms : num [1:77] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:77] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:77] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:77] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
## $ Total ul spore : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
## $ Infection Round : num [1:77] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:77] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:77] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
## $ Time plated : num [1:77] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time Incubated : num [1:77] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : num [1:77] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:77] "72" "72" "72" "72" ...
## $ infection.type : chr [1:77] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:77] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:77] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:77] 190513 190430 190430 190430 190430 ...
## $ Stain type : chr [1:77] "Sp.9 FISH + DY96" "DY96" "DY96" "DY96" ...
## $ Slide date : num [1:77] 190515 190501 190501 190501 190501 ...
## $ Slide number : num [1:77] 1 4 7 10 13 16 19 22 25 28 ...
## $ Slide Box : num [1:77] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:77] 190516 190502 190502 190502 190502 ...
# equivalently, the ',' represents an implicit &
str(filter(infection_meta.tbl,
`Plate Size` == 6,
`Spores(M)/cm2` == 0),
give.attr = FALSE)
## spc_tbl_ [77 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:77] "190426_VC20019_LUAm1_0M_72hpi" "190426_N2_LUAm1_0M_72hpi" "190426_AB1_LUAm1_0M_72hpi" "190426_JU397_LUAm1_0M_72hpi" ...
## $ experimenter : chr [1:77] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:77] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:77] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:77] 1 4 7 10 13 16 19 22 25 28 ...
## $ Worm_strain : chr [1:77] "VC20019" "N2" "AB1" "JU397" ...
## $ Total Worms : num [1:77] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:77] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:77] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:77] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
## $ Total ul spore : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
## $ Infection Round : num [1:77] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:77] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:77] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
## $ Time plated : num [1:77] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time Incubated : num [1:77] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : num [1:77] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:77] "72" "72" "72" "72" ...
## $ infection.type : chr [1:77] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:77] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:77] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:77] 190513 190430 190430 190430 190430 ...
## $ Stain type : chr [1:77] "Sp.9 FISH + DY96" "DY96" "DY96" "DY96" ...
## $ Slide date : num [1:77] 190515 190501 190501 190501 190501 ...
## $ Slide number : num [1:77] 1 4 7 10 13 16 19 22 25 28 ...
## $ Slide Box : num [1:77] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:77] 190516 190502 190502 190502 190502 ...
# != means "is not"
# Query for experiments on plate size not equal to 6 and Spore density not equal to 0
str(filter(infection_meta.tbl,
`Plate Size` != 6,
`Spores(M)/cm2` != 0),
give.attr = FALSE)
## spc_tbl_ [18 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:18] "200821_N2_LUAm1_30M_72hpi" "200821_JU1400_LUAm1_30M_72hpi" "200821_N2_ERTm5_10M_72hpi" "200821_JU1400_ERTm5_10M_72hpi" ...
## $ experimenter : chr [1:18] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:18] "RNAseq data rep 1" "RNAseq data rep 1" "RNAseq data rep 1" "RNAseq data rep 1" ...
## $ Infection Date : num [1:18] 200818 200818 200818 200818 200818 ...
## $ Plate Number : num [1:18] 13 14 15 16 17 18 27 28 29 30 ...
## $ Worm_strain : chr [1:18] "N2" "JU1400" "N2" "JU1400" ...
## $ Total Worms : num [1:18] 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 ...
## $ Spore Strain : chr [1:18] "LUAm1" "LUAm1" "ERTm5" "ERTm5" ...
## $ Spore Lot : chr [1:18] "2A" "2A" "2" "2" ...
## $ Lot concentration: num [1:18] 176000 176000 427000 427000 370000 370000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:18] 30 30 10 10 12 12 10 10 10 10 ...
## $ Total ul spore : num [1:18] 170.5 170.5 23.4 23.4 32.4 ...
## $ Infection Round : num [1:18] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:18] 1 1 1 1 1 1 0.25 0.25 0.25 0.25 ...
## $ Plate Size : num [1:18] 10 10 10 10 10 10 10 10 10 10 ...
## $ Spores(M)/cm2 : num [1:18] 0.382 0.382 0.127 0.127 0.153 ...
## $ Time plated : num [1:18] NA NA NA NA NA NA NA NA NA NA ...
## $ Time Incubated : num [1:18] 48 48 48 48 48 48 72 72 72 72 ...
## $ Temp : num [1:18] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:18] "72" "72" "72" "72" ...
## $ infection.type : chr [1:18] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:18] 200821 200821 200821 200821 200821 ...
## $ Location : chr [1:18] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:18] 200831 200831 200831 200831 200831 ...
## $ Stain type : chr [1:18] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "microB FISH + DY96" "microB FISH + DY96" ...
## $ Slide date : num [1:18] 200903 200903 200903 200903 200903 ...
## $ Slide number : num [1:18] 13 14 15 16 17 18 27 28 29 30 ...
## $ Slide Box : num [1:18] 4 4 4 4 4 4 4 4 4 4 ...
## $ Imaging Date : num [1:18] 200912 200912 200912 200912 200912 ...
# >= means "greater than or equal to"
# Query for experiments completed on plate size < 10cm and with more than or equal to 0.2 Spores(M)/cm2
str(filter(infection_meta.tbl,
`Plate Size` < 10,
`Spores(M)/cm2` >= 0.2),
give.attr = FALSE)
## spc_tbl_ [79 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:79] "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_10M_72hpi" "190426_N2_LUAm1_20M_72hpi" ...
## $ experimenter : chr [1:79] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:79] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:79] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:79] 2 3 5 6 8 9 11 12 14 15 ...
## $ Worm_strain : chr [1:79] "VC20019" "VC20019" "N2" "N2" ...
## $ Total Worms : num [1:79] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:79] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:79] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:79] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:79] 10 20 10 20 10 20 10 20 10 20 ...
## $ Total ul spore : num [1:79] 56.8 113.6 56.8 113.6 56.8 ...
## $ Infection Round : num [1:79] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:79] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:79] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:79] 0.354 0.708 0.354 0.708 0.354 ...
## $ Time plated : num [1:79] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time Incubated : num [1:79] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : num [1:79] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:79] "72" "72" "72" "72" ...
## $ infection.type : chr [1:79] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:79] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:79] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:79] 190513 190513 190513 190513 190513 ...
## $ Stain type : chr [1:79] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" ...
## $ Slide date : num [1:79] 190515 190515 190515 190515 190515 ...
## $ Slide number : num [1:79] 2 3 5 6 8 9 11 12 14 15 ...
## $ Slide Box : num [1:79] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:79] 190516 190516 190516 190516 190516 ...
# <= means "lesser than or equal too"
# Query for experiments where the Spores(M)/cm2 ratio is above 0 and <= 0.5
str(filter(infection_meta.tbl,
`Spores(M)/cm2` > 0,
`Spores(M)/cm2` <= 0.5),
give.attr = FALSE)
## spc_tbl_ [177 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:177] "190426_VC20019_LUAm1_10M_72hpi" "190426_N2_LUAm1_10M_72hpi" "190426_AB1_LUAm1_10M_72hpi" "190426_JU397_LUAm1_10M_72hpi" ...
## $ experimenter : chr [1:177] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:177] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:177] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:177] 2 5 8 11 14 17 20 23 26 29 ...
## $ Worm_strain : chr [1:177] "VC20019" "N2" "AB1" "JU397" ...
## $ Total Worms : num [1:177] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:177] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:177] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:177] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total Spores (M) : num [1:177] 10 10 10 10 10 10 10 10 10 10 ...
## $ Total ul spore : num [1:177] 56.8 56.8 56.8 56.8 56.8 ...
## $ Infection Round : num [1:177] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:177] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:177] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:177] 0.354 0.354 0.354 0.354 0.354 ...
## $ Time plated : num [1:177] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time Incubated : num [1:177] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : num [1:177] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:177] "72" "72" "72" "72" ...
## $ infection.type : chr [1:177] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:177] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:177] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:177] 190513 190513 190513 190513 190513 ...
## $ Stain type : chr [1:177] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" ...
## $ Slide date : num [1:177] 190515 190515 190515 190515 190515 ...
## $ Slide number : num [1:177] 2 5 8 11 14 17 20 23 26 29 ...
## $ Slide Box : num [1:177] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:177] 190516 190516 190516 190516 190516 ...
# Further filter the information by only using data from worm strains N2 and JU1400
str(filter(infection_meta.tbl,
`Spores(M)/cm2` > 0,
`Spores(M)/cm2` <= 0.5,
Worm_strain %in% c("N2", "JU1400")),
give.attr = FALSE)
## spc_tbl_ [110 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:110] "190426_N2_LUAm1_10M_72hpi" "190426_JU1400_LUAm1_10M_72hpi" "190426_N2_ERTm5_1.75M_72hpi" "190426_N2_ERTm5_3.5M_72hpi" ...
## $ experimenter : chr [1:110] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:110] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:110] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:110] 5 26 38 39 47 48 56 57 13 15 ...
## $ Worm_strain : chr [1:110] "N2" "JU1400" "N2" "N2" ...
## $ Total Worms : num [1:110] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:110] "LUAm1" "LUAm1" "ERTm5" "ERTm5" ...
## $ Spore Lot : chr [1:110] "2A" "2A" "2" "2" ...
## $ Lot concentration: num [1:110] 176000 176000 427000 427000 427000 427000 240000 240000 176000 176000 ...
## $ Total Spores (M) : num [1:110] 10 10 1.75 3.5 1.75 3.5 0.5 1.5 1 1 ...
## $ Total ul spore : num [1:110] 56.8 56.8 4.1 8.2 4.1 ...
## $ Infection Round : num [1:110] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:110] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:110] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:110] 0.3539 0.3539 0.0619 0.1238 0.0619 ...
## $ Time plated : num [1:110] 1300 1300 1300 1300 1300 1300 1300 1300 NA NA ...
## $ Time Incubated : num [1:110] 1600 1600 1600 1600 1600 1600 1600 1600 NA NA ...
## $ Temp : num [1:110] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:110] "72" "72" "72" "72" ...
## $ infection.type : chr [1:110] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:110] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:110] "Sample exhausted" "Sample exhausted" "None" "None" ...
## $ Staining Date : num [1:110] 190513 190513 190529 190529 190529 ...
## $ Stain type : chr [1:110] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "microb FISH + DY96" "microb FISH + DY96" ...
## $ Slide date : num [1:110] 190515 190515 190530 190530 190530 ...
## $ Slide number : num [1:110] 5 26 38 39 47 48 56 57 1 3 ...
## $ Slide Box : num [1:110] 2 2 3 3 3 3 2 2 3 3 ...
## $ Imaging Date : num [1:110] 190516 190516 201026 201026 201026 ...
# What if we wanted to view strains that are NOT N2 or JU1400?
str(filter(infection_meta.tbl,
`Spores(M)/cm2` > 0,
`Spores(M)/cm2` <= 0.5,
!Worm_strain %in% c("N2", "JU1400")),
give.attr = FALSE)
## spc_tbl_ [67 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr [1:67] "190426_VC20019_LUAm1_10M_72hpi" "190426_AB1_LUAm1_10M_72hpi" "190426_JU397_LUAm1_10M_72hpi" "190426_JU642_LUAm1_10M_72hpi" ...
## $ experimenter : chr [1:67] "CM" "CM" "CM" "CM" ...
## $ description : chr [1:67] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection Date : num [1:67] 190423 190423 190423 190423 190423 ...
## $ Plate Number : num [1:67] 2 8 11 14 17 20 23 29 32 35 ...
## $ Worm_strain : chr [1:67] "VC20019" "AB1" "JU397" "JU642" ...
## $ Total Worms : num [1:67] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore Strain : chr [1:67] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore Lot : chr [1:67] "2A" "2A" "2A" "2A" ...
## $ Lot concentration: num [1:67] 176000 176000 176000 176000 176000 176000 176000 176000 176000 427000 ...
## $ Total Spores (M) : num [1:67] 10 10 10 10 10 10 10 10 10 1.75 ...
## $ Total ul spore : num [1:67] 56.8 56.8 56.8 56.8 56.8 ...
## $ Infection Round : num [1:67] 1 1 1 1 1 1 1 1 1 1 ...
## $ 40X OP50 (mL) : num [1:67] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate Size : num [1:67] 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores(M)/cm2 : num [1:67] 0.354 0.354 0.354 0.354 0.354 ...
## $ Time plated : num [1:67] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time Incubated : num [1:67] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : num [1:67] 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr [1:67] "72" "72" "72" "72" ...
## $ infection.type : chr [1:67] "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing Date : num [1:67] 190426 190426 190426 190426 190426 ...
## $ Location : chr [1:67] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining Date : num [1:67] 190513 190513 190513 190513 190513 ...
## $ Stain type : chr [1:67] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" ...
## $ Slide date : num [1:67] 190515 190515 190515 190515 190515 ...
## $ Slide number : num [1:67] 2 8 11 14 17 20 23 29 32 35 ...
## $ Slide Box : num [1:67] 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging Date : num [1:67] 190516 190516 190516 190516 190516 ...
# Be careful how you filter your data! If none of the rows meet your criteria, it can return an empty tibble!
# Query experiments for any instances of Spores(M)/cm2 < 0.
str(filter(infection_meta.tbl,
`Spores(M)/cm2` < 0),
give.attr = FALSE)
## spc_tbl_ [0 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ experiment : chr(0)
## $ experimenter : chr(0)
## $ description : chr(0)
## $ Infection Date : num(0)
## $ Plate Number : num(0)
## $ Worm_strain : chr(0)
## $ Total Worms : num(0)
## $ Spore Strain : chr(0)
## $ Spore Lot : chr(0)
## $ Lot concentration: num(0)
## $ Total Spores (M) : num(0)
## $ Total ul spore : num(0)
## $ Infection Round : num(0)
## $ 40X OP50 (mL) : num(0)
## $ Plate Size : num(0)
## $ Spores(M)/cm2 : num(0)
## $ Time plated : num(0)
## $ Time Incubated : num(0)
## $ Temp : num(0)
## $ timepoint : chr(0)
## $ infection.type : chr(0)
## $ Fixing Date : num(0)
## $ Location : chr(0)
## $ Staining Date : num(0)
## $ Stain type : chr(0)
## $ Slide date : num(0)
## $ Slide number : num(0)
## $ Slide Box : num(0)
## $ Imaging Date : num(0)
A powerful set of functions called regular expressions (regex) can also be used for partial character matching. Regex is found in any programming language, not only in R, so familiarizing yourself with regex is a must as a programmer.
We will spend a large chunk of Lecture 05 discussing regular expressions. Until then, just remember that you can use them as part of your filtering process. Below you’ll find some useful functions that can help you accomplish this.
# More about regex
?regex()
# search for matches to argument pattern
?grep()
?grepl()
?regexpr()
?gregexpr()
?regexec()
# perform replacement of the first and all matches respectively.
?sub()
?gsub()
select() to subset and order columns in your
data frameOften times we don’t want all of the columns in our dataframe. You
can subset or remove columns by using the select()
function. You can also reorder columns using this function. Essentially
this is a great way to move columns around your data frame or as a way
to select() for the data columns you want in your data
frame.
The select() function takes the format of
select(data, ...) where
data is your data.frame or
tibble object.... is a comma-separated list of column names from
data based on a concise mini-language used in the
tidyverse.While there are many ways to select your columns in this function, we’ll cover a handful of the more common ways.
Suppose I want to look only at some of the experimental information, including the various infection/fixing/imaging dates as well as the worm and spore strain information.
# We just want to know information related to strain names, spore info and dates
head(select(infection_meta.tbl,
experiment, Worm_strain,
`Spore Strain`, `Spore Lot`, `Total Spores (M)`,
`Infection Date`, `Fixing Date`, `Imaging Date` ))
## [90m# A tibble: 6 x 8[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 0
## [90m2[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 10
## [90m3[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 20
## [90m4[39m 190426_N2_LUAm1_0M_~ N2 LUAm1 2A 0
## [90m5[39m 190426_N2_LUAm1_10M~ N2 LUAm1 2A 10
## [90m6[39m 190426_N2_LUAm1_20M~ N2 LUAm1 2A 20
## [90m# i 3 more variables: `Infection Date` <dbl>, `Fixing Date` <dbl>,[39m
## [90m# `Imaging Date` <dbl>[39m
starts_with() and ends_with()
helper functions to specify elements from a vectordplyr also includes some helper functions that allow you
to select variables (columns) based on their names. For example, we have
both Spore Strain and Spore Lot columns. We
can shortcut both of those using the starts_with() helper
function. Likewise we can select all of our “X Date” columns using the
ends_with() function.
# Select for columns starting with the word "Spore" or ending with "Date"
head(select(infection_meta.tbl,
experiment, Worm_strain,
starts_with("Spore", ignore.case = FALSE), `Total Spores (M)`,
ends_with("Date", ignore.case = FALSE)))
## [90m# A tibble: 6 x 10[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Spores(M)/cm2`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm1_0~ VC20019 LUAm1 2A 0
## [90m2[39m 190426_VC20019_LUAm1_1~ VC20019 LUAm1 2A 0.354
## [90m3[39m 190426_VC20019_LUAm1_2~ VC20019 LUAm1 2A 0.708
## [90m4[39m 190426_N2_LUAm1_0M_72h~ N2 LUAm1 2A 0
## [90m5[39m 190426_N2_LUAm1_10M_72~ N2 LUAm1 2A 0.354
## [90m6[39m 190426_N2_LUAm1_20M_72~ N2 LUAm1 2A 0.708
## [90m# i 5 more variables: `Total Spores (M)` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>[39m
As you can see from above, by using these helper verbs, we were able to pick up some extra columns that we’d “forgotten” about that would have some helpful information. We also reduced the amount of coding we had to generate and reduced our chances of errors due to spelling or typos!
contains() helper function to select text
within column names!Now that we can base our column selection on the start or end of
column names, we can also use the occurrence of words or patterns within
as well. Let’s do one last example and simplify how we select column
names like Spore Strain and
Total Spores (M).
A note about helper functions! The only caveat to mention in our quest to simplify selecting columns, we don’t have as much control over the specific selection order. Within these helper functions, the resulting selections are ordered based on their relative placement within the data frame or tibble.
# Simplify our previous column selections using the contains() helper
# Save the result as meta_trimmed.tbl
meta_trimmed.tbl <- select(infection_meta.tbl,
experiment, Worm_strain,
contains("Spore", ignore.case = FALSE),
ends_with("Date", ignore.case = FALSE))
# Take a look at the resulting table
head(meta_trimmed.tbl)
## [90m# A tibble: 6 x 10[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 0
## [90m2[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 10
## [90m3[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 20
## [90m4[39m 190426_N2_LUAm1_0M_~ N2 LUAm1 2A 0
## [90m5[39m 190426_N2_LUAm1_10M~ N2 LUAm1 2A 10
## [90m6[39m 190426_N2_LUAm1_20M~ N2 LUAm1 2A 20
## [90m# i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>[39m
arrange()The arrange(data, ...) function helps you to sort your
data. The default behaviour is to order results from smallest to largest
(or a-z for character data). You can switch the order by specifying
desc() (descending) as shown below. You can think of this
like sorting in Excel and you can sort by giving precedence to multiple
columns using a , to separate each. Rows will be ordered
based on the order of each column name submitted.
Let’s sort the meta_trimmed.tbl that we’ve generated in
previous steps.
# Arrange the trimmed metadata in descending order of Total Spores
desc_totalSpores <- arrange(meta_trimmed.tbl, desc(`Total Spores (M)`))
# Take a look at the sorted data
head(desc_totalSpores)
## [90m# A tibble: 6 x 10[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 200824_N2-rep1_LUAm~ N2-rep1 LUAm1 2A 40
## [90m2[39m 200824_JU1400-rep1_~ JU1400-rep1 LUAm1 2A 40
## [90m3[39m 200821_N2_LUAm1_30M~ N2 LUAm1 2A 30
## [90m4[39m 200821_JU1400_LUAm1~ JU1400 LUAm1 2A 30
## [90m5[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 20
## [90m6[39m 190426_N2_LUAm1_20M~ N2 LUAm1 2A 20
## [90m# i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>[39m
Suppose we want to look at the sorted data above looking only at infection experiments with > 0M spores in samples using only N2 and JU1400 worms infected by LUAm1 and ERTm5 microsporidia. Arrange these by infection date in ascending order.
How would you do it? How many experimental conditions are there that meet our criteria?
# Filter the data by Total spores, worm strains, and microsporidia strains
# Extra var 1
desc_totalSpores_filtered <- filter(desc_totalSpores,
`Total Spores (M)` > 0,
Worm_strain %in% c("N2", "JU1400"),
`Spore Strain` %in% c("LUAm1", "ERTm5"))
# Sort the data by Infection Date
# Extra var 2
desc_totalSpores_filtered_asc <- arrange(desc_totalSpores_filtered, `Infection Date`)
# Retrieve the experiment names
# Extra var 3
select_experiments <- select(desc_totalSpores_filtered_asc, experiment)
# How many observations (rows) are there?
nrow(select_experiments)
## [1] 74
%>%While the above code answered the question, it also created a series of intermediate variables that we aren’t interested in. These ‘intermediate variables’ were used to store data that was passed as input to the next function. You’ll notice that we didn’t need them for anything else in the code! If we aren’t careful this will quickly clutter our global environment (and memory!) - which kind of keeps track of these things for the kernel. Instead, we can use a more “natural flow” of data to produce our code.
The dplyr package, and some other common packages for
data frame manipulation in the tidyverse allow the use of
the pipe function, %>%. This is equivalent to
| for any UNIX aficionados. Piping allows
the output of one function to be passed along to the next function
without creating intermediate variables. Piping can save typing, make
your code more readable, and reduce clutter in your global environment
from variables you don’t need. The keyboard shortcut for
%>% is CTRL+SHIFT+M.
In essence the %>% pipe takes output
from the left-hand side and passes it as input
to the right-hand side. As an example we’ll look at how pipes
work in conjunction with the function filter(), and then
see the benefits to simplifying the code that we just wrote.
# Remember the R evaluates () from the inner to outer
head(filter(meta_trimmed.tbl, `Total Spores (M)` > 0))
## [90m# A tibble: 6 x 10[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 10
## [90m2[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 20
## [90m3[39m 190426_N2_LUAm1_10M~ N2 LUAm1 2A 10
## [90m4[39m 190426_N2_LUAm1_20M~ N2 LUAm1 2A 20
## [90m5[39m 190426_AB1_LUAm1_10~ AB1 LUAm1 2A 10
## [90m6[39m 190426_AB1_LUAm1_20~ AB1 LUAm1 2A 20
## [90m# i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>[39m
# Break the nested functions into their order of execution
meta_trimmed.tbl %>% filter(`Total Spores (M)` > 0) %>% head()
## [90m# A tibble: 6 x 10[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 10
## [90m2[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 20
## [90m3[39m 190426_N2_LUAm1_10M~ N2 LUAm1 2A 10
## [90m4[39m 190426_N2_LUAm1_20M~ N2 LUAm1 2A 20
## [90m5[39m 190426_AB1_LUAm1_10~ AB1 LUAm1 2A 10
## [90m6[39m 190426_AB1_LUAm1_20~ AB1 LUAm1 2A 20
## [90m# i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>[39m
# You can separate one or more functions in the pipeline
meta_trimmed.tbl %>%
# Notice the "." in the first position of filter - this is where data normally is assigned as a parameter
filter(., `Total Spores (M)` > 0) %>%
# Pass the filtered data to the head() function
head()
## [90m# A tibble: 6 x 10[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 10
## [90m2[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 20
## [90m3[39m 190426_N2_LUAm1_10M~ N2 LUAm1 2A 10
## [90m4[39m 190426_N2_LUAm1_20M~ N2 LUAm1 2A 20
## [90m5[39m 190426_AB1_LUAm1_10~ AB1 LUAm1 2A 10
## [90m6[39m 190426_AB1_LUAm1_20~ AB1 LUAm1 2A 20
## [90m# i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>[39m
. with %>% denotes the
object produced by the last called functionYou’ll notice that when piping, we are not explicitly writing the
first argument (our data frame) to filter(), but rather
passing the first argument to filter using %>%. The dot
. is sometimes used to fill in the first, or a later
argument as a placeholder. This notation is useful for nested functions
(functions inside functions) within our piping, which we will come
across a bit later.
What would working with pipes look like for our more complex example? Recall we want to filter for infection experiments with > 0M spores in samples using only N2 and JU1400 worms infected by LUAm1 and ERTm5 microsporidia. Arrange these by infection date in ascending order and display the first 20 entries.
# 1. Filter the data
# 2. Arrange the result
# 3. Grab the experiment
# 4. Print the first 20 entries
meta_trimmed.tbl %>%
filter(`Total Spores (M)` > 0, Worm_strain %in% c("N2", "JU1400"),`Spore Strain` %in% c("LUAm1", "ERTm5")) %>%
arrange(`Infection Date`) %>%
select(experiment) %>%
head(20)
## [90m# A tibble: 20 x 1[39m
## experiment
## [3m[90m<chr>[39m[23m
## [90m 1[39m 190426_N2_LUAm1_10M_72hpi
## [90m 2[39m 190426_N2_LUAm1_20M_72hpi
## [90m 3[39m 190426_JU1400_LUAm1_10M_72hpi
## [90m 4[39m 190426_JU1400_LUAm1_20M_72hpi
## [90m 5[39m 190426_N2_ERTm5_1.75M_72hpi
## [90m 6[39m 190426_N2_ERTm5_3.5M_72hpi
## [90m 7[39m 190426_JU1400_ERTm5_1.75M_72hpi
## [90m 8[39m 190426_JU1400_ERTm5_3.5M_72hpi
## [90m 9[39m 190605_N2_LUAm1_1M_24hpi
## [90m10[39m 190605_JU1400_LUAm1_1M_24hpi
## [90m11[39m 190606_N2_LUAm1_1M_48hpi
## [90m12[39m 190606_JU1400_LUAm1_1M_48hpi
## [90m13[39m 190607_N2_LUAm1_1M_72hpi
## [90m14[39m 190607_JU1400_LUAm1_1M_72hpi
## [90m15[39m 190611_N2_LUAm1_6M_0.5hpi
## [90m16[39m 190611_JU1400_LUAm1_6M_0.5hpi
## [90m17[39m 190611_N2_LUAm1_6M_3hpi
## [90m18[39m 190611_JU1400_LUAm1_6M_3hpi
## [90m19[39m 190808_N2_LUAm1_5M_24hpi
## [90m20[39m 190808_JU1400_LUAm1_5M_24hpi
When using more than 2 pipes %>% it gets hard to
follow for a reader (or yourself). Starting a new line after each pipe,
allows a reader to easily see which function is operating and makes it
easier to follow your logic. Using pipes also has the benefit that extra
intermediate variables do not need to be created, freeing up your global
environment for objects you are interested in keeping.
For this example we’ve tab-indented subsequent commands and parameters in the pipeline to additionally separate things visually.
# Pass our data.frame
meta_trimmed.tbl %>%
# 1. Filter the data
filter(`Total Spores (M)` > 0, # > 0 spores per infection
Worm_strain %in% c("N2", "JU1400"), # Only N2 and JU1400 animals
`Spore Strain` %in% c("LUAm1", "ERTm5")) %>% # Only LUAm1 and ERTm5 infections
# 2. Arrange the result
arrange(`Infection Date`) %>%
# 3. Grab the experiment column
select(experiment) %>%
# 4. Print the first 20 entries
head(20)
## [90m# A tibble: 20 x 1[39m
## experiment
## [3m[90m<chr>[39m[23m
## [90m 1[39m 190426_N2_LUAm1_10M_72hpi
## [90m 2[39m 190426_N2_LUAm1_20M_72hpi
## [90m 3[39m 190426_JU1400_LUAm1_10M_72hpi
## [90m 4[39m 190426_JU1400_LUAm1_20M_72hpi
## [90m 5[39m 190426_N2_ERTm5_1.75M_72hpi
## [90m 6[39m 190426_N2_ERTm5_3.5M_72hpi
## [90m 7[39m 190426_JU1400_ERTm5_1.75M_72hpi
## [90m 8[39m 190426_JU1400_ERTm5_3.5M_72hpi
## [90m 9[39m 190605_N2_LUAm1_1M_24hpi
## [90m10[39m 190605_JU1400_LUAm1_1M_24hpi
## [90m11[39m 190606_N2_LUAm1_1M_48hpi
## [90m12[39m 190606_JU1400_LUAm1_1M_48hpi
## [90m13[39m 190607_N2_LUAm1_1M_72hpi
## [90m14[39m 190607_JU1400_LUAm1_1M_72hpi
## [90m15[39m 190611_N2_LUAm1_6M_0.5hpi
## [90m16[39m 190611_JU1400_LUAm1_6M_0.5hpi
## [90m17[39m 190611_N2_LUAm1_6M_3hpi
## [90m18[39m 190611_JU1400_LUAm1_6M_3hpi
## [90m19[39m 190808_N2_LUAm1_5M_24hpi
## [90m20[39m 190808_JU1400_LUAm1_5M_24hpi
summarise()We can use summarise(data, ...) to define and retrieve
summarised information about our dataset in a simplified way. This
essentially creates a new data.frame object summarizing our
observations based on the functions supplied. Multiple functions and
their results can be placed into new columns we name. This is
essentially the same as running the apply() function on
specific columns except you can choose the columns and how they are
analysed!
Let’s generate some values based on the Total Spores (M)
column of meta_trimmed.tbl.
# Summarise abundance for mean and standard deviation of all rows combined
summarise(meta_trimmed.tbl,
totalSpores_sum = sum(`Total Spores (M)`),
totalSpores_mean = mean(`Total Spores (M)`),
totalSpores_sd = sd(`Total Spores (M)`)
)
## [90m# A tibble: 1 x 3[39m
## totalSpores_sum totalSpores_mean totalSpores_sd
## [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m [4m1[24m429. 5.18 6.34
Don’t forget about NA values! Remember that a number of functions can be told to ignore NA values when calculating their products. You’ll have to check their parameter information to be sure. For instance using ?mean to check if it can ignore NA values.
group_by() to reorder data based on variable
categoriesDoes the summary from above really make sense? Not exactly. We are
looking at Total Spores (M) but there are many different
microsporidia strains being tested across different conditions (ie worm
strains). We should take more variables into consideration. First, let’s
summarise by Spore Strain using group_by()
along with summarise().
The function group_by() produces a grouped
data.frame object which behaves mostly like a standard
data.frame but also has meta information about the grouping you’ve
specified. You can group by a single variable (column) or multiple ones
to produce multi-layered groupings. This underlying meta grouping can be
recognized by other dplyr methods such as
summarise()!
# Pass along trimmed data
meta_trimmed.tbl %>%
# group by Spore strain
group_by(., `Spore Strain`) %>%
# Look at the first 10 rows
head(., 10)
## [90m# A tibble: 10 x 10[39m
## [90m# Groups: Spore Strain [1][39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m 1[39m 190426_VC20019_LUA~ VC20019 LUAm1 2A 0
## [90m 2[39m 190426_VC20019_LUA~ VC20019 LUAm1 2A 10
## [90m 3[39m 190426_VC20019_LUA~ VC20019 LUAm1 2A 20
## [90m 4[39m 190426_N2_LUAm1_0M~ N2 LUAm1 2A 0
## [90m 5[39m 190426_N2_LUAm1_10~ N2 LUAm1 2A 10
## [90m 6[39m 190426_N2_LUAm1_20~ N2 LUAm1 2A 20
## [90m 7[39m 190426_AB1_LUAm1_0~ AB1 LUAm1 2A 0
## [90m 8[39m 190426_AB1_LUAm1_1~ AB1 LUAm1 2A 10
## [90m 9[39m 190426_AB1_LUAm1_2~ AB1 LUAm1 2A 20
## [90m10[39m 190426_JU397_LUAm1~ JU397 LUAm1 2A 0
## [90m# i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>[39m
Doesn’t look very different from a regular data.frame
does it? What if we try to summarise() with it?
# Pass along trimmed data
meta_trimmed.tbl %>%
# group by Spore strain
group_by(., `Spore Strain`) %>%
# Summarise the data now
...(.,
totalSpores_sum = sum(`Total Spores (M)`),
totalSpores_mean = mean(`Total Spores (M)`),
totalSpores_sd = sd(`Total Spores (M)`))
## Error in ...(., totalSpores_sum = sum(`Total Spores (M)`), totalSpores_mean = mean(`Total Spores (M)`), : could not find function "..."
Notice that the summarise() created a new
tibble and it has the columns totalSpores_sum,
totalSpores_mean and totalSpores_sd. You can
actually name these columns whatever you want as you generate the
code.
We also see the column, Spore Strain that we used to in
group_by() command. Any columns used in that command will
also be included since they are the foundation of the
summarise() call.
# Here's the equivalent code without piping
summarise(group_by(meta_trimmed.tbl, `Spore Strain`),
totalSpores_sum = sum(`Total Spores (M)`),
totalSpores_mean = mean(`Total Spores (M)`),
totalSpores_sd = sd(`Total Spores (M)`))
## [90m# A tibble: 10 x 4[39m
## `Spore Strain` totalSpores_sum totalSpores_mean totalSpores_sd
## [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
## [90m 1[39m AWRm78 21 3.5 0
## [90m 2[39m ERTm2 8 0.667 0.651
## [90m 3[39m ERTm5 232. 2.76 2.52
## [90m 4[39m ERTm5-96H 21 1.75 1.49
## [90m 5[39m LUAm1 923 6.79 8.05
## [90m 6[39m LUAm1-HK 20 10 0
## [90m 7[39m LUAm1-pel 20 10 0
## [90m 8[39m LUAm1-sup 20 10 0
## [90m 9[39m LUAm3 60 10 0
## [90m10[39m MAM1 104 7.43 3.46
Which option looks more “readable” to you? Piping or nesting functions?
mutate() to create new columns in your data
frameSpeaking about creating columns, let’s explore the
mutate() function. mutate() is a function to
create new columns, most often the product of
a calculation or concatenation of information. For example, let’s
concatenate names from some of the columns by putting
Spore Strain and Spore Lot columns together
with the paste() function. We can keep the result in a new
column, spore_strain_lot.
# Start with our tibble
meta_trimmed.tbl %>%
# Use the mutate command to paste two set of column information together
mutate(spore_strain_lot = ...(`Spore Strain`, `Spore Lot`, sep = "_")) %>%
# Peek at the result.
head()
## [1m[33mError[39m in `mutate()`:[22m
## [1m[22m[36mi[39m In argument: `spore_strain_lot = ...(`Spore Strain`, `Spore Lot`, sep = "_")`.
## [1mCaused by error in `...()`:[22m
## [33m![39m could not find function "..."
Up to this point we’ve been doing a lot of piping with
%>% and we can see the results in the output of our code
but we have NOT been saving the results to a variable.
This has two consequences:
If you want to save your data - perhaps after figuring out the series
of steps you want to implement - you need to assign it to a variable or
at least pipe it to a write*() function to save on
disk.
Unlike the mutate() command, we can also
directly and permanently alter our data
structure by adding in new columns. New columns can be easily created
using the $col_name syntax. If the column does not already
exist, it will be created. Otherwise its data will be overwritten.
# adding columns can also be done using "base R" code:
# This will permanently change meta_trimmed.tbl
meta_trimmed.tbl$... <- paste(meta_trimmed.tbl$`Spore Strain`,
meta_trimmed.tbl$`Spore Lot`,
sep = "_")
head(meta_trimmed.tbl)
## [90m# A tibble: 6 x 11[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 0
## [90m2[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 10
## [90m3[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 20
## [90m4[39m 190426_N2_LUAm1_0M_~ N2 LUAm1 2A 0
## [90m5[39m 190426_N2_LUAm1_10M~ N2 LUAm1 2A 10
## [90m6[39m 190426_N2_LUAm1_20M~ N2 LUAm1 2A 20
## [90m# i 6 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>, ... <chr>[39m
select() to remove columnsWe previously saw how to use select() to get a subgroup
of columns we want, but we can also use it to “remove” columns. Note how
our last call made a permanent change to meta_trimmed.tbl.
To exclude the variable spore_strain_lot from
meta_trimmed.tbl, we can use select(), then
overwrite meta_trimmed.tbl. Simply add a - (minus) in front
of spore_strain_lot.
# Check the column names before and after removing `compound_salinity`
colnames(meta_trimmed.tbl)
## [1] "experiment" "Worm_strain" "Spore Strain" "Spore Lot"
## [5] "Total Spores (M)" "Spores(M)/cm2" "Infection Date" "Fixing Date"
## [9] "Staining Date" "Imaging Date" "..."
meta_trimmed.tbl <- select(meta_trimmed.tbl, ...) # remove column spore_strain_lot
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
head(meta_trimmed.tbl)
## [90m# A tibble: 6 x 11[39m
## experiment Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
## [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m
## [90m1[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 0
## [90m2[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 10
## [90m3[39m 190426_VC20019_LUAm~ VC20019 LUAm1 2A 20
## [90m4[39m 190426_N2_LUAm1_0M_~ N2 LUAm1 2A 0
## [90m5[39m 190426_N2_LUAm1_10M~ N2 LUAm1 2A 10
## [90m6[39m 190426_N2_LUAm1_20M~ N2 LUAm1 2A 20
## [90m# i 6 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,[39m
## [90m# `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>, ... <chr>[39m
transmute() to create a new data.frametransmute() is along the same veins of
mutate() as it will also create a new column (variable).
However, it will drop the existing columns and give you a single column
for each new one specified. The output for transmute() is a
tibble of your new variable(s).
meta_trimmed.tbl %>%
# Transmute some new columns
...(spore_strain_lot = paste(`Spore Strain`, `Spore Lot`, sep = "_")) %>%
# look at the unique combinations
unique()
## Error in ...(., spore_strain_lot = paste(`Spore Strain`, `Spore Lot`, : could not find function "..."
Notice that you can accomplish a lot of similar functions as
summarise() using transmute() but there are
some small differences when it comes to the context of a
grouped_dataframe.
It is up to you whether you want to keep your data in a tibble or
switch to a vector if you are dealing with a single column of data (aka
variable). Using a dplyr function will maintain your data
in a tibble structure. Using non-dplyr functions will
switch your data to a vector if you have a 1-dimensional output.
Comprehension Question 4.0.0: Using our table meta_trimmed.tbl determine how many different combinations of C. elegans and microsporidia strains were tested (regardless of dosage or other factors). What are the top 10 most common combinations? Hint: use the group_by() %>% summarise() paradigm and check out the n() function.
# comprehension answer code 4.0.0
meta_trimmed.tbl %>%
... %>%
... %>%
... %>%
...
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
You’ve gone through all that trouble of learning how to import, filter, slice, and sort our datasets. Now comes the time to make sure that work doesn’t go to waste. During larger scripts, there may be intermediate files you want to save just in case an error occurs further along. It can also give you a sense of how things are progressing. Whether it is an intermediate or final dataset that you would like to keep, it’s time to learn how to save your files.
write_csv()We’re ready to write meta_trimmed.tbl or any other data
frame for that matter. In this case we won’t overwrite our old data set
but rather just create a second version of it.
Note that there are many ways to write data frames to files,
including writing back to excel files! First we’ll keep it simple and
within the tidyverse with write_csv() which is
a derivative of the write_delim() function. The
write_csv() function includes some of the following
parameters:
x: the data structure you’d like to write to file -
preferably a tibble or data.frame.file: the file path where you are sending the
output.na: a character string used for NA values
- defaults to “NA”.append: logical argument with FALSE as
default (overwrites an existing file) or TRUE will append
to an existing file. If the file doesn’t exist in either case, it writes
to a new file.col_names: logical argument to include the column names
as part of the file. If unspecified, it will take the
opposite value of append.getwd()
## [1] "C:/Users/mokca/Dropbox/!CAGEF/Course_Materials/Introduction_to_R/2025.09_Intro_to_R/lecture_02_introduction_to_dplyr"
# Write our data to file
...(x = meta_trimmed.tbl,
file = "data/infection_metadata_trimmed.csv",
col_names=TRUE)
## Error in ...(x = meta_trimmed.tbl, file = "data/infection_metadata_trimmed.csv", : could not find function "..."
%>% to direct your output to
write_csv()That’s right, you can pipe your data from filtering etc., over to
write_csv(). While you may think this is usually the
last step in your pipeline, it will actually
write the data to file and then pass the input forward through the next
pipe.
This has two implications:
Let’s revisit our last summarizing pipeline.
... <-
# Pass along trimmed data
meta_trimmed.tbl %>%
# group by Spore strain
group_by(., `Spore Strain`) %>%
# Summarise the data now
summarise(.,
totalSpores_sum = sum(`Total Spores (M)`),
totalSpores_mean = mean(`Total Spores (M)`),
totalSpores_sd = sd(`Total Spores (M)`)) %>%
# write your file to output
write_csv(x = ., file="data/infection_metadata_summary.csv", col_names=TRUE)
## [1m[33mError[39m in `as_tibble()`:[22m
## [1m[22m[33m![39m Column 11 must not have names of the form ... or ..j.
## Use `.name_repair` to specify repair.
## [1mCaused by error in `repaired_names()`:[22m
## [33m![39m Names can't be of the form `...` or `..j`.
## [31mx[39m These names are invalid:
## * "..." at location 1.
# Take a look at the result of the pipeline
write_result
## Error in eval(expr, envir, enclos): object 'write_result' not found
write_xlsx()Sometimes you may want to write multiple data frames to a single file
like a xlsx format with sheets. This can be a convenient
way to keep data together rather than making multiple
write_csv() commands.
The writexl package contains the
write_xlsx() function which can write the contents of a
named list of data frames to multiple sheets. This function includes the
following parameters:
x: a data.frame, tibble, or a
named list of data framespath: the path to write the .xlsx file tocol_names: logical parameter for whether or not to
write column names at the top of each sheetLet’s give it a try to wrap up today’s lecture!
# install.packages("writexl", dependencies = TRUE)
# library(writexl)
# Write a list to a single xlsx file
...(x = list("..." = infection_meta.tbl, "metadata_summary" = write_result),
path = "data/metadata_analysis.xlsx",
col_names = TRUE
)
## Error in ...(x = list(... = infection_meta.tbl, metadata_summary = write_result), : could not find function "..."
That’s a wrap for our second class on R! You’ve made it through and we’ve learned about the following:
dplyr package.At the end of this lecture a Quercus assignment portal will be available to submit a RMD version of your completed skeletons from today (including the comprehension question answers!). These will be due one week later, before the next lecture. Each lecture skeleton is worth 2% of your final grade but a bonus 0.5% will also be awarded for submissions made within 24 hours from the end of lecture (ie 1600 hours the following day). To save your notebook:
Soon after the end of each lecture, a homework assignment will be available for you in DataCamp. Your assignment is to complete chapters from the Data Manipulation with dplyr course: Transforming data with dplyr (900 points); Aggregating data (1050 points); and Selecting and transforming data (750 points) for a total of 2700 points. This is a pass-fail assignment, and in order to pass you need to achieve a least 2025points (75%) of the total possible points. Note that when you take hints from the DataCamp chapter, it will reduce your total earned points for that chapter.
In order to properly assess your progress on DataCamp, at the end of each chapter, please print a PDF of the summary. You can do so by following these steps:
Learn section along
the top menu bar of DataCamp. This will bring you to the various courses
you have been assigned under
My Assignments.VIEW CHAPTER DETAILS link. Do
this for all sections on the page!ctrl + A to highlight all of
the visible text.You may need to take several screenshots if you cannot print it all in a single try. Submit the file(s) or a combined PDF for the homework to the assignment section of Quercus. By submitting your scores for each section, and chapter, we can keep track of your progress, identify knowledge gaps, and produce a standardized way for you to check on your assignment “grades” throughout the course.
You will have until 1259 hours on Tuesday, September 16th to submit your assignment (right before the next lecture).
Revision 1.0.0: materials prepared in R Markdown by Oscar Montoya, M.Sc. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.1.0: edited and prepared for CSB1020H F LEC0142, 09-2021 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.1.1: edited and prepared for CSB1020H F LEC0142, 09-2022 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.1.2: edited and prepared for CSB1020H F LEC0142, 09-2023 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.2.0: edited and prepared for CSB1020H F LEC0142, 09-2024 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.2.1: edited and prepared for CSB1020H F LEC0142, 09-2025 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
This class is supported by DataCamp, the most intuitive learning platform for data science and analytics. Learn any time, anywhere and become an expert in R, Python, SQL, and more. DataCamp’s learn-by-doing methodology combines short expert videos and hands-on-the-keyboard exercises to help learners retain knowledge. DataCamp offers 350+ courses by expert instructors on topics such as importing data, data visualization, and machine learning. They?re constantly expanding their curriculum to keep up with the latest technology trends and to provide the best learning experience for all skill levels. Join over 6 million learners around the world and close your skills gap.
Your DataCamp academic subscription grants you free access to the DataCamp’s catalog for 6 months from the beginning of this course. You are free to look for additional tutorials and courses to help grow your skills for your data science journey. Learn more (literally!) at DataCamp.com.
https://googlesheets4.tidyverse.org/
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Syntax.html
http://stat545.com/block009_dplyr-intro.html
http://stat545.com/block010_dplyr-end-single-table.html
http://stat545.com/bit001_dplyr-cheatsheet.html
http://dplyr.tidyverse.org/articles/two-table.html
https://cran.r-project.org/web/packages/writexl/writexl.pdf
You may find for one reason or another that you prefer to use the
base commands of R to import data. Here’s you’ll find a quick primer on
using the read.csv() function.
read.csv()Let’s read our infection_meta.csv data file into R.
While we do these exercises, we are going to become friends with the
help() function. Let’s start by using the
read.csv() function which is actually a simplified version
of the function read.table(). Both of these functions are
part of the base utils package in R, which is imported
automatically. The read.csv()
function has but is not limited to the following parameters:
file: the file name we want to importheader: logical parameter noting if your imported table
has a header or not. Uses TRUE as the default value.sep: character parameter denoting how your fields are
separated. Uses , as the default value.library(tidyverse)
# Remember the head() function? We'll import our file but just look at the first 6 rows of it
head(read.csv("data/infection_meta.csv"))
## experiment experimenter description
## 1 190426_VC20019_LUAm1_0M_72hpi CM Wild isolate phenoMIP retest
## 2 190426_VC20019_LUAm1_10M_72hpi CM Wild isolate phenoMIP retest
## 3 190426_VC20019_LUAm1_20M_72hpi CM Wild isolate phenoMIP retest
## 4 190426_N2_LUAm1_0M_72hpi CM Wild isolate phenoMIP retest
## 5 190426_N2_LUAm1_10M_72hpi CM Wild isolate phenoMIP retest
## 6 190426_N2_LUAm1_20M_72hpi CM Wild isolate phenoMIP retest
## Infection.Date Plate.Number Worm_strain Total.Worms Spore.Strain Spore.Lot
## 1 190423 1 VC20019 1000 LUAm1 2A
## 2 190423 2 VC20019 1000 LUAm1 2A
## 3 190423 3 VC20019 1000 LUAm1 2A
## 4 190423 4 N2 1000 LUAm1 2A
## 5 190423 5 N2 1000 LUAm1 2A
## 6 190423 6 N2 1000 LUAm1 2A
## Lot.concentration Total.Spores..M. Total.ul.spore Infection.Round
## 1 176000 0 0.00000 1
## 2 176000 10 56.81818 1
## 3 176000 20 113.63636 1
## 4 176000 0 0.00000 1
## 5 176000 10 56.81818 1
## 6 176000 20 113.63636 1
## X40X.OP50..mL. Plate.Size Spores.M..cm2 Time.plated Time.Incubated Temp
## 1 0.15 6 0.0000000 1300 1600 21
## 2 0.15 6 0.3538570 1300 1600 21
## 3 0.15 6 0.7077141 1300 1600 21
## 4 0.15 6 0.0000000 1300 1600 21
## 5 0.15 6 0.3538570 1300 1600 21
## 6 0.15 6 0.7077141 1300 1600 21
## timepoint infection.type Fixing.Date Location Staining.Date
## 1 72 continuous 190426 Sample exhausted 190513
## 2 72 continuous 190426 Sample exhausted 190513
## 3 72 continuous 190426 Sample exhausted 190513
## 4 72 continuous 190426 Sample exhausted 190430
## 5 72 continuous 190426 Sample exhausted 190513
## 6 72 continuous 190426 Sample exhausted 190513
## Stain.type Slide.date Slide.number Slide.Box Imaging.Date
## 1 Sp.9 FISH + DY96 190515 1 2 190516
## 2 Sp.9 FISH + DY96 190515 2 2 190516
## 3 Sp.9 FISH + DY96 190515 3 2 190516
## 4 DY96 190501 4 2 190502
## 5 Sp.9 FISH + DY96 190515 5 2 190516
## 6 Sp.9 FISH + DY96 190515 6 2 190516
# Note that unlike read_csv() the result here is strictly a dataframe
str(read.csv("data/infection_meta.csv"))
## 'data.frame': 276 obs. of 29 variables:
## $ experiment : chr "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
## $ experimenter : chr "CM" "CM" "CM" "CM" ...
## $ description : chr "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
## $ Infection.Date : int 190423 190423 190423 190423 190423 190423 190423 190423 190423 190423 ...
## $ Plate.Number : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Worm_strain : chr "VC20019" "VC20019" "VC20019" "N2" ...
## $ Total.Worms : int 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Spore.Strain : chr "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
## $ Spore.Lot : chr "2A" "2A" "2A" "2A" ...
## $ Lot.concentration: int 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
## $ Total.Spores..M. : num 0 10 20 0 10 20 0 10 20 0 ...
## $ Total.ul.spore : num 0 56.8 113.6 0 56.8 ...
## $ Infection.Round : int 1 1 1 1 1 1 1 1 1 1 ...
## $ X40X.OP50..mL. : num 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
## $ Plate.Size : int 6 6 6 6 6 6 6 6 6 6 ...
## $ Spores.M..cm2 : num 0 0.354 0.708 0 0.354 ...
## $ Time.plated : int 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
## $ Time.Incubated : int 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
## $ Temp : int 21 21 21 21 21 21 21 21 21 21 ...
## $ timepoint : chr "72" "72" "72" "72" ...
## $ infection.type : chr "continuous" "continuous" "continuous" "continuous" ...
## $ Fixing.Date : int 190426 190426 190426 190426 190426 190426 190426 190426 190426 190426 ...
## $ Location : chr "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
## $ Staining.Date : int 190513 190513 190513 190430 190513 190513 190430 190513 190513 190430 ...
## $ Stain.type : chr "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
## $ Slide.date : int 190515 190515 190515 190501 190515 190515 190501 190515 190515 190501 ...
## $ Slide.number : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Slide.Box : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Imaging.Date : int 190516 190516 190516 190502 190516 190516 190502 190516 190516 190502 ...
NA valuesIn addition to the functions we discussed in class there are some
additional methods for dealing with NA values that can be
helpful, depending on the structure of your data.
# Set up our data structures again
na_vector <- c(5, 6, NA, 7, 7, NA)
na_vector
## [1] 5 6 NA 7 7 NA
# A data.frame with NA values
counts <- data.frame(Site1 = c(geneA = 2, geneB = 4, geneC = 12, geneD = 8),
Site2 = c(geneA = 15, geneB = NA, geneC = 27, geneD = 28),
Site3 = c(geneA = 10, geneB = 7, geneC = 13, geneD = NA))
counts
## Site1 Site2 Site3
## geneA 2 15 10
## geneB 4 NA 7
## geneC 12 27 13
## geneD 8 28 NA
na.omit() function will remove
NA entriesIn addition to our combination of functions from class, the
na.omit() function can return an object where the
NA values have been deleted in a
listwise manner. This means complete cases (ie
rows in a data.frame) will be removed instead. Keeping this in mind, you
can also use this on a vector.
# equivalentish to our previous code our more complex code using is.na() and which() in combination
na.omit(na_vector)
## [1] 5 6 7 7
## attr(,"na.action")
## [1] 3 6
## attr(,"class")
## [1] "omit"
# But under the hood it is doing something slightly different
# see how it works on data.frames?
na.omit(counts)
## Site1 Site2 Site3
## geneA 2 15 10
## geneC 12 27 13
# Apply the log function to non-NA observations. In this case na.omit can be useful.
#?na.omit
apply(counts, MARGIN = 1, na.omit(log))
## geneA geneB geneC geneD
## Site1 0.6931472 1.386294 2.484907 2.079442
## Site2 2.7080502 NA 3.295837 3.332205
## Site3 2.3025851 1.945910 2.564949 NA
# Read more about apply() to learn more about why our data.frame is now transposed
You can similarly deal with NaN’s in R.
NaN’s (not a number) are NAs (not available),
but NAs are not NaN’s. NaN’s
appear for imaginary or complex numbers or unusual numeric values. Some
packages may output NAs, NaN’s, or Inf/-Inf (can be found with
is.finite()).
na_vector <- c(5, 6, NA, 7, 7, NA)
nan_vector <- c(5, 6, NaN, 7, 7, 0/0)
is.na(na_vector)
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
is.na(nan_vector)
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
is.na(nan_vector)
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
is.nan(nan_vector)
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
# These type of operations are very useful when working with conditional statements (if else, while, etc.).